Classical test theory (CTT)

Classical test theory is based on a set of assumptions regarding the properties of test scores. Although different models of CTT are based on slightly different sets of assumptions, all models share a fundamental premise postulating that the observed score of a person on a test is the sum of two unobservable components, true score and measurement error. True score is generally defined as the expected value of a person’s observed score if the person were tested an infinite number of times on an infinite number of equivalent tests. Therefore, the true score reflects the stable characteristic of the object of measurement (i.e., the person). Measurement error is defined as a random “noise” that causes the observed score to deviate from the true score.

Classical test theory


Important Concepts in Classical Test Theory

Reliability and Parallel Tests
True score and measurement error, by definition, are unobservable. However, researchers often need to know how well observed test scores reflect the true scores of interest. In CTT, this is achieved by estimating the reliability of the test, defined as the ratio of true score variance to observed score variance. Alternatively, reliability is sometimes defined as the square of the correlation between the true score and the observed score. Although they are expressed differently, these two definitions are equivalent and can be derived from assumptions underlying CTT.

Assumptions of Classical Test Theory

Classical test theory assumes linearity—that is, the regression of the observed score on the true score is linear, the following assumptions are often made by classical test theory:

●The expected value of measurement error within a person is zero.
● The expected value of measurement error across persons in  the population is zero.
● True score is uncorrelated with measurement error in the          population of persons.
● The variance of observed scores across persons is equal to
the sum of the variances of true score and measurement error.
●  Measurement errors of different tests are not correlated.

standard deviation

In statistics, the standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values.

Reliability vs Validity

Reliability and validity are independent of each other. A measurement maybe valid but not reliable, or reliable but not valid.

Cronbach's alpha

Cronbach's alpha is a function of the number of items in a test, the average covariance between item-pairs, and the variance of the total score.

Item Response Theory...

In psychometrics, item response theory (IRT), also known as latent trait theory, strong true score theory, or modern mental test theory, is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals’ performances on a test item and the test takers’ levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult.

Item Response Theory


About Item Response Theory

IRT is based on the idea that the probability of a correct/keyed response to an item is a mathematical function of person and item parameters. The person parameter is construed as (usually) a single latent trait or dimension. Examples include general intelligence or the strength of an attitude. Parameters on which items are characterized include their difficulty (known as "location" for their location on the difficulty range), discrimination (slope or correlation) representing how steeply the rate of success of individuals varies with their ability, and a pseudoguessing parameter, characterising the (lower) asymptote at which even the least able persons will score due to guessing (for instance, 25% for pure chance on a multiple choice item with four possible responses).

Item parameters

The 1 parameter logistic model (1PL) 
also known as the Rasch model, only uses item difficulty as a parameter for calculating a person’s ability.

The 2 parameter logistic model (2PL) 

uses both item difficulty and item discrimination (the extent which the item is measuring the underlying psychological construct) as parameters.

The 3 parameter logistic model (3PL) 

uses item difficulty, item discrimination and the extent which candidates can guess the correct answer, as parameters.