Psychological Testing and Assessment Principles

Posted by Anonymous and classified in Psychology and Sociology

Written on April 16, 2026 in English with a size of 680.19 KB

TDmHCj84AAAAASUVORK5CYII=

1. Test Items and Formats

Purpose of Tests: Assess knowledge, individual differences, and predict future performance. Instructor Role: Teach and create valid tests. Student Role: Understand and apply material.

Item Types

Dichotomous (T/F, Y/N): Easy to score, but limited information.
Polytomous (MCQs): Ideal; 3-5 options, 1 correct, similar length, match grammar.
Bad MCQs: All/none of the above, joke options, tricky negatives.

Correction for Guessing: R-W/(n-1) (R= # right, W= # wrong, n= # choices). Lose marks for guessing, 0 for leaving blank. Likert Scales: 5-7 points; standard wording helps reduce confusion. Context Effects: One question can affect the next (anchoring and adjustment); use clear labels to reduce bias. Category Format: Scale of 1-10 (e.g., cartoons for pain level). Other: Essays, checklists, visual analogue, Q-sorts.

2. Item Analysis

Difficulty: Average score per item. Ideal = (chance+1)/2 (4 options = 0.625). Discriminability: Extreme Group: Highest-lowest group scores (e.g., 95-55=40). Point-Biserial: Correlation between item score (0/1) and total test score (one true dichotomous, one continuous). Item Characteristic Curve: Slope indicates item discrimination; separates groups into "bins"; percentiles have equal bins, grade won't.

3. Item Response Theory

Computer Adapted Tests: Tailored to ability, adapts difficulty, reduces number of items needed. Good for precision; avoids fatigue from easy/hard item overload.

4. Test Administration and Standardization

Factors: Feedback, stakes, motivation, examiner familiarity. Manuals: Crucial for valid/reliable results; includes scripts, timing, responses. Expectancy Effects: Rosenthal effect (expect a certain result, tend to see it), stereotype threat, observer bias, halo/horn (expect good so see good/opposite). Scoring Manuals: Prevent bias; ensure consistency. Computer Testing: Enhances standardization.

5. Test-Taker Variables

Test anxiety, illness, fatigue, disability, observer reactivity (when observer is checked they do better), drift (rating changes over time, away from original rules), expectancy bias (halo/horn).

6. Intelligence

Definition: Processing speed, reasoning, problem-solving, adaptability. Spearman’s g: General mental ability underlying all tasks. Thurstone: Primary mental abilities. CHC Theory: Gf (fluid – novel problem solving), Gc (crystallized – knowledge). Sternberg: Analytic, Creative, Practical intelligence. Gardner: Multiple intelligences (low empirical support). Binet/Stanford-Binet: Originally developed for identifying intellectual disability. Now includes 5 areas: fluid reasoning, knowledge, quantitative reasoning, visual-spatial, working memory (verbal and non-verbal). IQ Scores: Scaled score M = 10, SD = 3. IQ M = 100, SD = 15. Predictive of school, SES, health, income, etc. Flynn Effect: ~3 points/decade increase, environment-driven. Heritability: Increases with age. Shared environment effects fade.

7. Personality Testing

Personality: Traits that influence behaviour, emotion, thought. Uses: Career, clinical diagnosis, research, marketing. Structured (objective) vs. Projective (e.g., Rorschach, TAT): Rorschach is controversial, interpretive, low validity/reliability. TAT is used for themes and emotions.

8. Creating Personality Tests

Logical-Content Strategy: Items are written based on theory/face validity. Assumes test items reflect the trait they intend to measure. Common in early tests (e.g., Woodworth Personal Data Sheet). Empirical Strategy: Criterion group; compare known groups, cross-validate, see if results are similar. MMPI: T/F items; multiple scales: Clinical: Psychiatric concerns. Content: Fears, relationships, etc. Validity: L(lie)- faking good, F(infrequency)- faking bad, K- defensiveness, denial, evasiveness. FB- consistency (early vs. late), cooperation, internal check.

9. Factor-Analytic Strategy

How adjectives (traits) cluster together statistically. Cattell: Reduced traits to 16 PF. Big Five (OCEAN): Openness: Crystallized IQ up. Conscientiousness: Job performance up. Extraversion: Success in some roles. Agreeableness: Social success. Neuroticism: Life satisfaction down.

10. Dark Triad

1. Narcissism, 2. Machiavellianism, 3. Psychopathy: Correlations vary with Big Five traits.

11. Other Measures

SWLS: Satisfaction with Life Scale (sum of items). PANAS: Positive and Negative Affect Schedule. MBTI (Myers Briggs): E/I, S/N, T/F, J/P (4 letter types). Criticisms: Weak validity/reliability, fun but not rigorous. Lacks construct validity and is not stable, not psychometrically strong. CPI: California Psychological Inventory (career guidance, leadership, self-control, social effectiveness). EPPS: Edwards Personal Preference Schedule (ipsative measure; scores relative within-person, not normed). NEO-PI: Constructed via factor analysis to measure Big Five.

12. Test Bias

Fairness: Test what it's supposed to, minimize irrelevant variance. Item Bias: Cultural references, strict scoring, speed bias. Construct-Irrelevant Variance: Test includes irrelevant factors. Differential Validity: Test predicts differently across groups. Differential Prediction: Test predicts an outcome inaccurately for certain groups (e.g., Black and White students both score 100, test over-predicts for one group and underperforms for other; the test has differential prediction bias). *Used for ACTUAL outcomes.* Within-Group Norming: Scoring test-takers only against their own demographic group (now illegal under Civil Rights Act of 1991). Content Bias: Test questions include language or ideas unfamiliar to certain groups.

13. Reducing Bias

Differential Item Functioning Analysis: Equate scores, compare item-level performance, remove biased items, use multiple methods to define and measure constructs, combine qualitative and quantitative evaluations of fairness (e.g., two groups of people with same total ability perform differently on specific item).

14. Group Differences

Real gaps may reflect systemic issues (e.g., nutrition, education), consider access to learning opportunities, test prep, social context. Stereotype Threat: Awareness of stereotype affects performance.

15. Fair Selection Models

Unqualified Individualism: Select by absolute score. Qualified Individualism: Ignore group membership. Quotas: Predetermine group ratios. Regression Adjustments: Separate lines: Higher validity, fewer minorities selected. Constant ratio: Add points to low scoring group. Cole/Darlington: Separate lines + adjustment for predictor equivalence.

16. Future of Testing

Current Trends: More tests, better technology, objectivity, growing public input. Computerization and Internet Testing: Growing use of adaptive and online formats. Integration and Cognitive Science: New testing models, brain-based tools. Moral Issues: Human rights, informed consent, labelling, test access. Professional Issues: Accountability, training, ethical test use. Social Issues: Fairness, stereotype threat, systemic inequities. Testing Prospects: Positive growth, but constant change and controversy expected. Responsibility: Users must interpret appropriately, ensure fairness. Actuarial vs. Clinical: Actuarial: Algorithmic, consistent. Clinical: Judgement-based, holistic.

17. Standardized Tests

Group vs. Individual Tests: Group: Fast/cost-effective, less personalized. Individual: Richer data. Educational Uses: Achievement tests: What's been learned (e.g., MAT). Aptitude tests: Predict future ability (e.g., abstract reasoning). Henmon-Nelson: General ability test; correlates with school success. Kuhlmann-Anderson: Less cultural bias, strong reliability. College Admissions: SAT: Achievement focus; removed obscure vocab, no guessing penalty. ACT: 4 academic domains (English, math, reading, science); correlates highly with SAT. GRE: Predicts GPA weakly (r= .2-.4), still widely used. LSAT: Modest predictor of bar exam success. Civil/Military Testing: GATB: Measures multiple aptitudes (e.g., clerical, spatial). ASVAB: 10 subtests; used for military placement; computer adaptive.

18. Testing in Clinical and Counselling Psychology

Purpose of Testing: Diagnosis, treatment planning, progress evaluation, legal/forensic decisions. Objective Tests: Structured responses, standardized scoring (e.g., MMPI, NEO-PI). Projective Tests: Ambiguous stimuli (e.g., Rorschach, TAT); interpretive, less reliable. Interpretation Practices: 1. Multi-method assessments (e.g., self-reports, interviews, behavioural observation). 2. Consider cultural context, response bias (faking good/bad), and consistency. Ethical/Legal Use: Must be valid for purpose, used with informed consent, and administered by qualified professionals. Misuse in legal setting or overgeneralization of results is a concern.

19. Wechsler Scales of Intelligence

Versions: WPPSI (Preschool), WISC-VSC-V (Children), WAIS-IV (Adults), WASI (Abbreviated). Index Scores and Subtests:

VCI (Verbal Comprehension Index): Vocabulary, Similarities, Information, Comprehension.
VSI (Visual Spatial Index): Block Design, Visual Puzzles.
FRI (Fluid Reasoning Index): Matrix Reasoning, Figure Weights, Picture Concepts, Arithmetic.
WMI (Working Memory Index): Digit Span, Picture Span, Letter-Number Sequencing.
PSI (Processing Speed Index): Coding, Symbol Search, Cancellation.

Uses: Diagnosis (LDs, ADHD), cognitive profiling, giftedness. Interpretation: Look for large subtest discrepancies (pattern analysis). FSIQ only valid if index scores are consistent.

Related entries:

Tags: