Making the invisible visible

Yaw Bimpeh explains why it’s so difficult to validate a test

How can we measure something we can’t actually see? Learning is something we all understand as a concept, but we can’t actually measure it directly – unlike quantities such as height, where a tape measure or ruler will work well. Instead, our methods for measuring skills and knowledge involve inference or interpretation: for example, using answers to a series of questions that we hope will assess the various aspects of learning within a particular subject.

The process of designing questions or assessments and collecting evidence from them to support our inferences is referred to as ‘validation’; questions or tests that support our inferences are said to be valid, or to have validity. However, the issue of validity has received less attention than other aspects of test quality, such as reliability (i.e. the consistency and accuracy of test scores). This may simply be because it is harder to gather empirical evidence to evaluate and defend a claim for validity.

Validating a test becomes even more challenging – and also more important – in the context of high-stakes national exams, such as GCSEs and A-levels in the UK. For such exams, we must first determine what we are trying to measure. The exams regulator Ofqual stipulates that the question papers produced by exam boards must cover a set of carefully constructed assessment objectives (i.e. knowledge or skills that students are expected to demonstrate).

However,  the assessment objectives cannot be observed or measured directly since they are defined at quite a high level – for example, one objective might cover students’ knowledge and understanding of chemistry principles, concepts and techniques. Instead, a range of questions are designed to assess specific aspects of the objective.

To check that our assessments are valid then, we must ensure that the questions we ask are actually accessing the right knowledge and understanding; we must find ways to make these invisible assessment objectives visible. Ideally, we would be able to develop and check our exam questions using empirical data, integrating our validity investigations into the routine evaluations that exam boards carry out on their assessments. However, there are no set practical guidelines or established methods for doing so, and, as yet, little progress has been made in addressing this problem.

Nonetheless, as an exam board, validation is a vital aspect of our work, so we would like to be able to gather empirical evidence in a quick and easy way. This is where my work comes in. I’ve been looking for empirical criteria that will enable us to check that our pool of test questions do in fact assess the skills and knowledge that we intend them to assess…

Yaw Bimpeh

Share this page