Awarding objective test papers: is there a correct answer?

  1. Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.) Educational Measurement (2nd ed.) 508-600. Washington DC: American Council on Education.
  2. Baird, J. & Dhillon, D. (2005). Qualitative expert judgements on examination standards: valid but inexact. AQA research report.
  3. Balch, W. R. (1989). Item order affects performance in multiple-choice exams. Teaching of Psychology,16(2), 75-77.
  4. Béguin, A., Kremers, E. & Alberts, R. (2008). National Examinations in the Netherlands: standard setting procedures and the effects of innovations. Paper presented at the IAEAConference, Cambridge, September 2008.
  5. Beretvas, N. S. (2004). Comparison of Bookmark difficulty locations under different item-response models. Applied Psychological Measurement, 28(1), 25-47.
  6. Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion-referenced tests. Review of Educational Research, 56(1), 137-172.
  7. Buckendhahl, C. W., Smith, R. W., Impara, J. C. & Plake, B. S. (2002). A comparison of Angoff and Bookmark standard setting methods. Journal of Educational Measurement, 39(3), 253-263.
  8. Chinn, R. N. & Hertz, N. R. (2002). Alternative approaches to standard setting for licensing and certification examinations. Applied Measurement in Education, 15(1), 1-14.
  9. Cizek, G. J. (1996). Setting passing scores. Educational Measurement: Issues and Practice, 15(2), 20-31.
  10. Cizek, G. J. (Ed.) (2001). Setting Performance Standards: Concepts, methods and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.
  11. Cizek, G., Bunch, M. B. & Koons, H. (2004). Setting performance standards: contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50.
  12. Cizek, G. J. & Bunch, M. B. (2007). Standard setting: a guide to establishing and evaluating performance standards on tests. Sage Publications Inc.
  13. Clauser, B. E., Swanson, D. B. & Harik, P. (2002). Multivariate generalizability analysis of the impact of training and examinee performance information on judgements made in an Angoff-style standard-setting procedure. Journal of Educational Measurement, 39(4), 269-290.
  14. Cross, L. H., Impara, J. C., Frary, R. B. & Jaegar, R. M. (1984). A comparison of three methods for establishing minimum standards on the National Teacher Examinations. Journal of Educational Measurement, 21, 113-129.
  15. De Gruijter, D. N. M. (1985). Compromise models for establishing examination standards. Journal of Education Measurement, 22, 263-269.
  16. Ebel, R. L. (1972). Essentials of educational measurement. Englewood Cliffs, NJ: Prentice-Hall.
  17. Fearnley, A. J. (2003). An investigation into the possible application of item response theory to provide feedback of information to awarders in the use of Angoff’s method of standard setting in AQA OTQ components. AQA research report.
  18. Fowles, D. (2003). Standard Setting: a review of some recent approaches to setting boundary marks on the basis of examiner judgement. AQA research report.
  19. Fowles, D. (2004). A trial of a bookmark approach to grading and comparisons with the Angoff method. AQA research report.
  20. Fowles, D. (2005). A further trial of a bookmark approach to grading objective tests. AQA research report.
  21. Green , D. R., Trimble, C. S. & Lewis, D. M. (2003). Interpreting the results of three different standard setting procedures. Educational Measurement: Issues and Practice, 22(1), 22-23.
  22. Hambleton, R. K., Brennan R. L., Brown, W., Dodd, B., Forsyth, R. A., Mehrens, W. A., Nelhaus, J., Reckase, M. D., Rindone, D., van der Linden, W. J. & Zwick, R. (2000). A response to “Setting reasonable and useful performance standards” in the National Academy of Sciences’ Grading the Nation’s Report Card. Educational Measurement: Issues and Practice, 19(2), 5-14.
  23. Hambleton, R. K. & Plake, B. S. (1995). Using an extended Angoff procedure to set standards on complex performance assessments. Applied Measurement in Education, 8(1), 41-55.
  24. Huck, S. W. & Bowers, N. D. (1972). Item difficulty level and sequence effects in multiple-choice assessments tests. Journal of Educational Measurement, 9(2), 105-111.
  25. Hurtz, G. M. & Hertz, N. R. (1999). How many raters should be used for establishing cutoff scores with the Angoff method: a generalizability theory study. Educational and Psychological Measurement, 59, 885-897.
  26. Impara, J. C. & Plake, B. S. (1997). Standard setting: an alternative approach. Journal of Educational Measurement, 34(4), 353-366.
  27. Impara, J. C. & Plake, B. S. (1998). Teachers’ ability to estimate item difficulty: a test of the assumptions of the standard setting method. Journal of Educational Measurement, 35(1), 69-81.
  28. Jaegar, R. M. (1982). An iterative structured judgement process for establishing standards on competency tests: Theory and application. Educational Evaluation and Policy Analysis, 4, 461-475.
  29. Karantonis, A. & Sireci, S. G. (2006). The Bookmark standard-setting method: a literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.
  30. Laffitte, R. G. Jr. (1984). Effects of item order in achievement test scores and students’ perceptions of test difficulty. Teaching of Psychology, 11(4), 212-213.
  31. MacCann, R. G. & Stanley, G. (2006). The use of Rasch modelling to improve standard setting. Practical Assessment, Research & Evaluation, 11(2), 1-17.
  32. Mehrens, W. A. (1995). Methodological issues in standard setting for educational exams. In Proceedings of Joint Conference on Standard Setting for Large-Scale Assessments (p.p. 221-263). Washington DC: National Assessment Governing Board and National Center for Educational Statistics.
  33. Meyer, L. (2003a). AQA Standards Unit analyses for the GCE Economics awarding meeting, June 2003. AQA internal report.
  34. Meyer, L. (2003b). Repeat AQA Standards Unit analyses originally run for the GCE Economics awarding meeting, June 2003. AQA internal report.
  35. Mills, C. N. & Melican, G. J. (1988). Estimating and adjusting cutoff scores: features of selected methods. Applied Measurement in Education, 1(3), 261-275.
  36. Mitzel, H. C., Lewis, D. M., Patz, R. J. & Green, D. R. (2001). The Bookmark procedure: psychological perspectives. In G. J. Cizek (Ed.), Setting Performance Standards: Concepts, methods and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.
  37. National Academies of Sciences (2005). Measuring literacy: performance levels for adults, interim report. Appendix C: July 2004 Bookmark standard-setting session with the 1992 NALS data (pages 221-284). Retrieved from
  38. Neely, D. L., Springston, F. J. & McCann, S. J. H. (1994). Does item order affect performance on multiple-choice exams? Teaching of Psychology, 21(1), 44-45.
  39. Newman, D. L., Kundert, D. K., Lane, D. S. Jr. & Bull, K. S. (1988). Effect of varying item order on multiple-choice test scores: importance of statistical and cognitive difficulty. Applied Measurement in Education, 1(1), 89-97.
  40. Perlini, A. H., Lind, D. L. & Zumbo, B. D. (1998). Context effects on examinations: the effects of time, item order and item difficulty. Canadian Psychology, 39(4), 299-307.
  41. Schagen, I. & Bradshaw, J. (2003). Modelling item difficulty for bookmark standard setting. Paper presented at the BERA annual conference, Herriott-Watt University, Edinburgh, 11-13 September 2003.
  42. Schultz, E. M., Lee, W. & Mullen, K. (2005). A domain-level approach to describing growth in achievement. Journal of Educational Measurement, 42, 1-26.
  43. Schultz, E. M. & Mitzel, H. C. (April, 2005). The Mapmark standard setting method. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.
  44. Stringer, N. (2007). Evaluation of the February 2007 alternative awarding procedure trials. AQA internal report.
  45. Vos, P. & Kuiper, W. (2003). Predecessor items and performance level. Studies in Educational Evaluation, 29, (191-206).
  46. Wang, N. (2003). Use of the Rasch IRT model in standard setting: an item-mapping method. Journal of Educational Measurement, 40(3), 231-253.
  47. Yin, P. & Sconing, J. (2007). Estimating standard errors of cut scores for item mapping and mapmark procedures: a generalizability theory approach. Educational and Psychological Measurement, 68(1), 25-41.
  48. Zieky, M. J. (2001). So much has changed: how the setting of cutscores has evolved since the 1980s. In G. J. Cizek (Ed.), Setting Performance Standards: Concepts, methods and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.