Marking quality

An assessment is only as good as its component parts. Among other features, the underlying curriculum, teaching, test design and marking can all contribute to reliability and validity in the measurement of candidate ability. Each of these components is critical. At the Centre for Education Research and Practice, we are constantly adding research evidence as we strive to improve AQA’s processes of assessment design, of marker recruitment and training, and of quality assurance.

Quality of measurement starts with good assessment design. The high stakes examinations offered by AQA include questions in a variety of formats, from multiple choice through to extended response essays. This ensures validity of measurement, by testing a variety of skills, but means that marking is more complex. 

Using our own data, we have shown that the design of mark schemes can impact on how effectively we can distinguish between a good response and a poorer response (1). We have investigated the features of mark schemes which are more likely to lead to reliable marking (2, 3, 4). We have also identified the sorts of questions that are difficult to mark, with a view to improving question structure and mark schemes (5).

All of our research findings have been included in AQA’s design principles which are used as a benchmark when creating question papers and mark schemes. But while the assessment must be effective, so too must the marking. We therefore need to consider those that carry out the marking.

Characteristics of a good marker cannot be distilled into a simple wish list and much of the research confirms that marking accuracy is not dependent upon easily measureable features of a marker’s background (6, 7). Some of our studies have shown that markers with experience mark more reliably but, for some types of marking, specific subject expertise is not always a necessary requirement (6, 4, 8). On the other hand, our research has shown that training and standardisation play a key role in improving marking reliability (9, 6). In practice, these findings have been used to introduce new technology and to allocate questions to markers more intelligently.

In addition to good assessments and effective markers, a thorough quality assurance system must underpin all efforts to improve the quality of marking. AQA has embraced recent advances in technology and changes have, once more, been supported by research evidence (10). For example, distributing candidate responses for different questions among different markers has been shown to increase reliability of the mark awarded to a candidate (11, 12).

Recently, we have been involved in piloting new and innovative ways of assessing the quality of candidates’ work using comparative judgements (13, 14). We have collected views on assessment reliability from key stakeholders and on marking experience from our markers (15, 16, 17). We have also considered marking reliability alongside the accuracy with which we classify candidates in terms of grades. This knowledge has been used to help inform the design of new national grading structures (18, 19, 20).

Our research has been instrumental in providing sound evidence to help AQA introduce improvements in its processes of assessment design, of marker recruitment and training, and of quality assurance. We aim to explore how best to ensure marking is as reliable as it can be, without losing the focus on validity that is at the heart of the English examination system (21, 22, 23).

Anne Pinot de Moira, Head of Assessment Research

References

  1. Effective discrimination in mark schemes
  2. Features of a levels-based mark scheme and their effect on marking reliability
  3. Levels-based mark schemes and marking bias
  4. Who is the specialist? The effect of specialisms on the marking reliability of an English literature examination
  5. Seeds of doubt: learning lessons from item level marking
  6. The effect of marker background and training on the quality of marking in GCSE English
  7. Examiner background and the effect on marking reliability
  8. A Rasch analysis of the quality of marking of GCSE Science
  9. Online or face-to-face? An experimental study of examiner training
  10. Identifying errant markers: quality assurance systems in an e-marking environment
  11. Why item mark? The advantages and disadvantages of e-marking
  12. Gains in marking reliability from item-level marking: is the sum of the parts better than the whole?
  13. Testing the validity of judgements about geography essays using the Adaptive Comparative Judgement method
  14. Using Adaptive Comparative Judgement to obtain a highly reliable rank order in summative assessment
  15. Carry on examining: further investigation
  16. Qualification users’ perceptions and experiences of assessment reliability
  17. Public perceptions of reliability
  18. Classification accuracy and consistency in GCSE and A Level examinations offered by the Assessment and Qualifications Alliance (AQA) November 2008 to June 2009
  19. Estimation of composite score classification accuracy using compound probability distributions
  20. Setting the grade standards in the first year of the new GCSEs
  21. Assessment expertise project: validity of assessment
  22. Contemporary validity theory and the assessment context
  23. The achieved weightings of assessment objectives as a source of validity evidence

Share this page