The effect of sample size on item parameter estimation for the partial credit model

Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An important requirement in applying IRT models is the stability and accuracy of model parameters. One of the major factors that affects the stability and accuracy of model parameters is the size of samples used to calibrate the items. Substantial research work has been undertaken in the past to study the effect of sample size on the estimation of IRT model parameters using simulations. Most of the simulation studies have focused on homogeneous item types and involved the use of model-generated response data. An important limitation of such simulation studies is that the assumptions of the IRT models are strictly met. However, data from operational tests do not normally strictly meet the model assumptions. The work reported in this paper investigates the effect of sample size on the stability and accuracy of model parameters of the Partial Credit Model (PCM) for a large data set generated from a highstakes achievement test consisting of a mixture of dichotomous and polytomous items. Results from this study indicate that the level of stability and accuracy of model parameters is affected by the sample size, the number of categories of the items and the distribution of category scores within the items. The results obtained also suggest that the actual measurement errors associated with model parameters for polytomous items estimated from operational test data can be substantially higher than the model standard errors. It is furthermore suggested that the error introduced to true score equating using common items can be evaluated by a comparison with measurement errors inherent in the tests.

Export to citation manager (RIS File)

Share this page