The comparability of onscreen and paper and pencil tests: no further research required?

This paper presents an analysis of the first national high-stakes on-screen assessments offered in the United Kingdom, and considers the research and policy agendas required to support their development and use alongside paper and pencil alternatives. It argues that the UK has a lot to learn from the United States in the area of comparability, as the on-screen assessments currently being launched in the UK are far closer to US high-stakes assessments than the paper and pencil tests ever were. The US now has the benefit of twenty years of comparability research on such tests and has conducted an extensive search for so called test mode effects which affect the comparability of the results from two identical tests with different administration methods. Two findings in particular are unequivocal: nonspeeded objective tests are rarely liable to test mode effects, and when they are liable to these effects, they are marginal; test anxiety caused by poor software design or unfamiliarity with the assessment environment can have a deleterious effect on results. It does not seem sensible or necessary to replicate these findings in the UK context; rather resources should be utilised to move on-screen assessments beyond their current conceptualisation. As the present study found a marginal test mode effect due to speededness, the paper considers a regulatory framework that should be put in place to monitor the introduction of the first generation of high-stakes onscreen assessments in the UK. In brief, the assessments should show evidence that on-screen tests do not offer undue advantages over their paper and pencil equivalents due to the speed with which they can be answered, and that suitable practice tests should be made available to reduce test anxiety. A light regulatory framework, rather than the onerous requirement to demonstrate comparability of every test introduced, should stimulate the innovation required to develop on-screen assessments that encourage and reward construct relevant behaviour.

