Psikhologicheskie Issledovaniya • ISSN 2075-7999
peer-reviewed • open access journal


Naumenko A.S., Orel E.A. Who are the judges? Individual traits of test developers and test items characteristics

Full text in Russian: Науменко А.С., Орел Е.А. А судьи кто? Индивидуальные особенности разработчиков и характеристики тестовых заданий
South Ural State University, Chelyabinsk, Russia
State University – Higher School of Economics, Moscow, Russia

About authors
Suggested citation

An overview of foreign studies analyzing the effects of individual traits of test developers on item characteristics is presented. Although the existence of such effects seems obvious the relevant studies are relatively few. In many cases such effects may lower the developed instruments validity. The current overview opens this issue discussion in Russian psychodiagnostics and is to draw the researchers’ attention to its importance. Manifestations of a test developer personality in knowledge tests, personality inventories and professional skills tests are considered. Some recommendations for minimizing and/or compensating the effects of individual traits of test developers on his/her creative products are formulated.

Keywords: psychodiagnostics, test construction, test development, test developer, test items author, test items, systematic distortions, personality inventory, knowledge tests, professional skills assessment


Cyrillic letters are transliterated according to BSI standards.

Abelson R.P. Script processing in attitude formation and decision making // Callol J.C., Payne J.W. (Eds.). Cognition and social behavior. Hillsdale, Nj: Erlbaum, 1976. P. 33–45.

Anastasi A. Psychological testing. N.Y.: Prentice Hall, 1968. 665 p.

Angoff W.H. Scales, norms, and equivalent scores // Thorndike R.L. (Ed.). Educational measurement. 2nd ed. Washington, DC: American Council on Education. 1971. P. 508–600.

Ashton S.G., Goldberg, L.R. In response to Jackson's challenge: The comparative validity of personality scales constructed by the external (empirical) strategy and scales developed intuitively by experts, novices, and laymen // Journal of Research in Personality. 1973. Vol. 7, N 1. P. 1–20.

Banks C.G. Cue selection and evaluation elicited during the rating process // Department of Management Working Paper N 82–83, University of Texas, Austin, TX, 1982. 37 p.

Banks C., Roberson L. Performance Appraisers as Test Developers // Academy of Management Review. 1985. Vol. 10(1). P. 128–142.

Bejar I. Subject matter experts’ assessment of item statistics // Applied Psychological Measurement. 1983. Vol. 7. P. 303–310.

Boothroyd R.A., McMorris R.F., Pruzek R.M. What do teachers know about measurement and how did they find out? (Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA, 1992.) // ERIC Document Reproduction Service. No. 351 309. 24 p.

Borman W. The Rating of Individuals in Organizations: An Alternate Approach // Organizational Behavior & Human Performance. 1984. Vol. 12, N 1. P. 105–124.

Bormuth J.R. On the theory of achievement test items. Chicago: University of Chicago Press. 1970. 163 p.

Brozo W.G., Schmelzer R.V., Spires H.A.A study of test-wiseness clues in college and university teacher-made tests // Journal of Learning Skills. 1984. Vol. 3. P. 56–68.

Callenbach C. The effects of instruction and practice in content-independent test-taking techniques upon the standardized reading test scores of selected second grade students // Journal of Educational Measurement. 1973. Vol. 10. P. 25–30.

Carter K. Do teachers understand principles for writing tests? // Journal of Teacher Education. 1984. Vol. 35. P. 57–60.

Carter K. Tackling the testing issue: Test-wiseness for teachers and students (Paper presented at the annual meeting of the American Educational Research Association, Montreal, 1983.) // ERIC Document Reproduction Service No. 482 315. 19 p.

Crehan K.D., Koehler R.A., Slakter M.J. Longitudinal studies of test-wiseness // Journal of Educational Measurement. 1974. Vol. 11. P. 209–212.

Diamond J.J., Evans W.J. An investigation of the cognitive correlates of test-wiseness // Journal of Educational Measurement. 1972. Vol. 9. P. 145–150.

Erickson M.E. Test sophistication: An important consideration // Journal of Reading. 1972. Vol. 16. P. 140–144.

Fennessey D. Primary teachers’ assessment practices: some implications for teacher training (Paper presented at the Annual Meeting of the South Pacific Association for Teacher Education, Frankston, Victoria, Australia, 1982.) // ERIC Document Reproduction Service No. 229 346. 21 p.

Gaines W.G., Jongsma E.A. The effect of training in test-taking skills on the achievement scores of fifth grade pupils // Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, Illinois, April 1974. 7 p.

Gibb B.G. Test-wiseness as secondary cue response: Ph.D. Thesis / School of Education, Stanford University. 1964. 99 p.

Gifford C.S., Fluitt J.L. How to make your students test wise // American School Board Journal. 1980. Vol. 29. P. 29–40.

Gross L.J. The Effects of Test-Wiseness on Standardized Test Performance // Scandinavian Journal of Educational Research. Vol. 21, Issue 1. 1977. P. 97–111.

Gullickson A.R., Ellwein M.C. Post hoc analysis of teacher-made tests: The goodness-of-fit between prescription and practice // Educational Measurement: Issues and Practice. 1985. Vol. 4. P. 15–18.

Hogarth R.M. Judgment and choice. N.Y.: Wiley, 1987. 324 p.

Huff D. Score: The strategy of taking tests. N.Y.: Appleton-Century-Crofts, 1961. 119 p.

Impara J.C., Plake B.S. Teacher s’ ability to estimate item difficulty: A test of the assumptions in the Angoff standard setting method // Journal of Education Measurement. 1998. Vol. 35, N 1. P. 69–81.

Jackson D.N. The Relative validity of scales prepared by naive item writers and those based on empirical methods of personality scale construction // Educational and Psychological Measurement 1975. Vol. 35. P. 361–370.

Jafarpur A. Is the test constructor a facet? // Language Testing. 2003. Vol. 20, N. 1. P. 57–87.

Jozefowicz R.F., Koeppen B.M., Case S., Galbraith R., Swanson D., Glew R. The quality of in-house medical school examination // Academic medicine. 2002. Vol. 77, Issue 2. P. 156–161.

Koens F., Rademakers J.J.D.J.M., Ten C. Validation of core medical knowledge by postgraduates and specialists // Medical Education. 2002. Vol. 39. P. 911–917.

Landy F.J., Farr J.L. Performance rating // Psychological Bulletin. 1980. Vol. 87. P. 72–107.

Landy F.J., Farr J,L. Police performance appraisal // JSAS Catalog of Selected Documents in Psychology. 1976. Vol. 6. P. 83–97.

Landy F., Farr J., Saal F., Freytag W. Behaviorally Anchored Scales for Rating the Performance of Police Officers // Journal of Applied Psychology. 1970. Vol. 61, N 6. P. 750–758.

Lange R. Flipping the coin: Test anxiety to test-wiseness // Journal of Reading. 1978. Vol. 22. P. 274–277.

Leonard D. Cognitive complexity and the similarity-attraction paradigm // Journal of research in personality. 1976. Vol. 10. P. 83–88.

Mannix E., Neale M. What Differences Make a Difference? // Psychological Science in the Public Interest. 2005. Vol. 6. N 2. P. 31–55.

Marso R.N., Pigge F.L. The status of classroom teachers’ test construction proficiencies: assessment by teachers, principals, and supervisors validated by analyses of actual teacher-made tests (Paper presented at the Annual Meeting of the National Council of Measurement in Education, San Francisco, 1989.) // ERIC Document Reproduction Service No. 306 283. 39 p.

Marso R.N., Pigge F.L. An analysis of teacher-made tests: item types, cognitive demands, and item construction errors // Journal of Contemporary Educational Psychology. 1991. Vol. 16. P. 279–286.

Marso R.N., Pigge F.L. A summary of published research: Classroom teachers' knowledge and skills related to the development and use of teacher-made tests (Paper presented at the annual meeting of the American Educational Research Association, San Francisco, 1992.) // ERIC Document Reproduction Service No. ED 346 148. 29 p.

McPhai I. Coaching, test-wiseness and test scores // NAPW Journal. 1984. Vol. 1, N 2. P. 19–26.

Mehrens W.A., Lehmann I.J. Measurement and evaluation in education and psychology. 4th ed. N.Y.: Wadsworth Publishing, 1991. 592 p.

Metfessel N.S., Sax G. Systematic biases in the keying of correct responses on certain standardized tests // Educational and Psychological Measurement. 1958. Vol. 18. P. 787–790.

Millman J. Criterion-referenced measurement // Popham W.J. (Ed.). Evaluation in education: Current applications. Berkeley, California: McCutchan Publishing Co., 1974. P. 311–397.

Millman J., Bishop C.H., Ebel R. An analysis of test-wiseness // Educational and Psychological Measurement. 1965. Vol. 25. P. 707–726.

Naumenko A.S. Selecting experts for developing multiple-choice tests // The 11th European Congress of Psychology. A Rapidly Changing World – Challenges for Psychology. Oslo, Norway, 7–10 July, 2009. Final Program. P. 133.

Nilsson I., Wedman I. On test-wiseness and some related constructs // Educational Reports, UMEA, 1974. N 7. P. 147–159.

Oakland T. The effects of test-wiseness materials on standardized test performance of preschool disadvantaged children // Journal of School Psychology. 1972. Vol. 10. P. 355–360.

Oescher J., Kirby P.C. Assessing teacher-made tests in secondary math and science classrooms (Paper presented at the Annual Meeting of the National Council on Measurement in Education. Boston, MA, 1990.) // ERIC Document Reproduction Service No. 322 169. 36 p.

Omvig C.P. Effects of guidance on the results of standardized achievement testing. Measurement and Evaluation in Guidance. 1971. Vol. 4. P. 47–52.

Osterlind S.J. Constructing test items: multiple-choice, constructed-response, performance and other formats. Boston, MA: Kluwer Academic Publishers, 1998. 352 p.

Parrish B.W. A test to test test-wiseness // Journal of Reading. 1982. Vol. 25. N 7. P. 672–75.

Pigge F.L., Marso R.N. Supervisors agenda: identifying and alleviating teachers' test construction errors // Paper presented at the Annual Conference of the Ohio Association for Supervision and Curriculum Development (Columbus, OH, November 3–4, 1988). 50 p.

Poddiakov A.N.
Test tvorchestva – "sinyaya ptitsa" psikhologii // Znanie - sila. 2003. N 5. S. 101–104. ; Electronic version: URL: (data obrashcheniya: 20.08.2010). [in Russian]

Poddiakov A.N. Psikhodiagnostika intellekta: vyyavlenie i podavlenie sposobnostei, vyyavlenie i podavlenie sposobnykh // Psikhologiya. Zhurnal Vysshei shkoly ekonomiki. 2004. T. 1, N 4. S. 75–80. ; Electronic version: URL: (data obrashcheniya: 20.08.2010). [in Russian]

Poddiakov A.N. Testirovanie intellekta, konkurentsiya i refleksiya // Refleksivnye protsessy i upravlenie. 2007. N 2. S. 46-56. ; Electronic version: URL: (data obrashcheniya: 20.08.2010). [Full text in Russian. PDF]

Preston R. Ability of students to identify correct responses before reading // Journal of Educational Research. 1964. Vol. 58. P. 181–183.

Roid G. A comparison of item-writing methods for criterion-referenced tests // Paper presented at the joint Annual Meetings of the American Educational Research Association and the National Council on Measurement in Education (Boston, MA, April 7–11, 1980). 24 p.

Roid G., Haladyna T. The emergence of an item-writing technology // Review of educational research. 1980. Vol. 50, N 2. P. 293–314.

Roznowski M., Bassett J. Training test-wiseness and flawed item types // Applied Measurement in Education. 1982. Vol. 5, N 1. P. 35–48.

Sarnacki R.E. An examination of test-wiseness in the cognitive test domain // Review of Educational Research. 1979. Vol. 49, N 2. P. 252–279.

Schmitt N., Hill T. Sex and race composition of assessment center groups as a determinant of peer and assessor ratings // Journal of Applied Psychology. Vol. 62, N 3. P. 261–264.

Sharpley C.F., Rogers H.J. Naive versus sophisticated item-writers for the assessment of anxiety // Journal of Clinical Psychology. 2006. Vol. 41, Issue 1. P. 58–62.

Shepard L. Implications for standard setting of the NAE evaluation of NAEP achievement levels // Paper presented at the Joint Conference on Standard Setting for Large Scale Assessments, Washington, DC. 1994. 21 p.

Snyder M., Swann W.B. Hypothesis-testing processes in social interaction // Journal of Personality and Social Psychology. 1978. Vol. 36, N 11, P. 1202–1212.

Stiggins R.J., Bridgeford N.J. The ecology of classroom assessment // Journal of Educational Measurement. 1985. Vol. 22. P. 271–286.

Thorndike E.L. Educational Measurement. Washington, D.C.: American Council on Education, 1971. 768 p.

Thorndike R.L., Hagen E. Measurement and evaluation in psychology and education. 8th ed. N.Y.: Macmillan Publishing Company, 2009. 528 p.

Valentin J.D., Godfrey J.R. The reliability and validity of tests constructed by Seychellois teachers. A paper presented at the 1996 joint conference organised by Educational Research Association (Singapore) and Australian Association for Research in Education held in Singapore from November 25th to 29th, 1996. 25 p.

Verhoeven B.H., Verwijnen A.M.M., Muijtjens G.M., Scherpbier A.J.J.A., van der Vleuten C.P.M. Panel expertise for an Angoff standard setting procedure in progress testing: item writers compared to recently graduated students // Medical Education. 2002. Vol. 36, N 9. P. 860–867.

Verhoeven D.H., van der Steeg A.F.W., Scherpbier A.J.J.A., Muijtjens A.M.M., Verwijnen G.M., van der Vleuten C.P.M. Reliability and credibility of an Angoff standard setting procedure in progress testing using recent graduates as judges // Medical Education. 1999. Vol. 33. P. 832–837.

Williams J.M. (1991). Writing quality teacher-made tests: a handbook for teachers // ERIC Document Reproduction Service No. 349 726. 1991. 48 p.

Wise S.L., Lukin L.E., Roos L.L. Teacher beliefs about training in testing and measurement // Journal of Teacher Education. 1991. Vol. 42. P. 37–42.

Received 21 June 2010. Date of publication: 26 August 2010.

About authors

Naumenko Anna S. Ph.D., Associate Professor, Department of Psychology, South Ural State University, prospekt Lenina, 76, 454080 Chelyabinsk, Russia.
E-mail: Этот адрес электронной почты защищен от спам-ботов. У вас должен быть включен JavaScript для просмотра.

Orel Ekaterina A. Ph.D., Senior Lecturer, Department of Organizational Psychology, Faculty of Psychology, State University – Higher School of Economics, Volgogradsky prospect, 46b,109316 Moscow, Russia.
E-mail: Этот адрес электронной почты защищен от спам-ботов. У вас должен быть включен JavaScript для просмотра.

Suggested citation

APA Style
Naumenko, A. S., & Orel, E. A. (2010). Who are the judges? Individual traits of test developers and test items characteristics. Psikhologicheskie Issledovaniya, 4(12). Retrieved from 0421000116/0032. [in Russian, abstr. in English].

Russian State Standard GOST P 7.0.5-2008
Naumenko A.S., Orel E.A. Who are the judges? Individual traits of test developers and test items characteristics [Electronic resource] // Psikhologicheskie Issledovaniya. 2010. N 4(12). URL: (date of access: 0421000116/0032. [in Russian, abstr. in English]

Back to top >>

Related Articles