Home Print this page Email this page Users Online: 1369
Home About us Editorial board Search Ahead of print Current issue Archives Submit article Instructions Subscribe Contacts Login 


 
 Table of Contents  
ORIGINAL ARTICLE
Year : 2015  |  Volume : 3  |  Issue : 3  |  Page : 220-225

Angoff's method: The impact of raters' selection


Department of Anatomy, College of Medicine, International University of africa, Khartoum, Sudan

Date of Web Publication3-Aug-2015

Correspondence Address:
Assad A Rezigalla
Department of Anatomy, College of Medicine, International University of africa, Khartoum
Sudan
Login to access the Email id

DOI: 10.4103/1658-631X.162027

Rights and Permissions
  Abstract 

Background: Several methods have been proposed for setting an examination pass mark (PM), and the Angoff's method or its modified version is the preferred one. Selection of raters is important and affects the PM.
Aims and Objectives: This study aims to investigate the selection of raters in the Angoff's method and the impact of academic degrees and experience on the PM decided on.
Materials and Methods: Type A MCQs examination was used in this study as a model. Raters with different academic degrees and experience participated in the study. Raters estimations were statiscally analyzed.
Results: The selection of raters was crucial. Agreement among raters could be achieved by those with relevant qualifications and expertise. There was an association between high estimation, academic degree, expertise and high PM.
Conclusion: Selection of raters for the Angoff's method should include those with different academic degrees, backgrounds and experience so that a satisfactory PM may be reached by means of a reasonable agreement.

  Abstract in Arabic 



ملخص البحث :
هدفت الدراسة إلى تقييم اختيار المقيمين وتأثير مؤهلاتهم العلمية وخبراتهم على تحديد درجة النجاح. تم استخدام اختبار مكون من أسئلة متعددة الخيارات من النوع الاول. شارك في الدراسة مقيمون بدرجات علمية وخبرات تدريسية مختلفة في تحليل التقييمات الناتجة إحصائيا. وكان لاختيار المقيمين تأثيراً مفصلياً. ووجد توافق بين المقيمين ذوي الدرجات العلمية والخبرات المتقاربة. والخلاصة أن اختيار المقيمين يجب أن يشمل مختلف الدرجات والخبرات والخلفيات حتى يمكن الوصول لدرجة نجاح بالتوافق بين المقيمين.

Keywords: Academic degree, Angoff′s method, experience, raters′ selection, setting pass mark


How to cite this article:
Rezigalla AA. Angoff's method: The impact of raters' selection . Saudi J Med Med Sci 2015;3:220-5

How to cite this URL:
Rezigalla AA. Angoff's method: The impact of raters' selection . Saudi J Med Med Sci [serial online] 2015 [cited 2019 Dec 16];3:220-5. Available from: http://www.sjmms.net/text.asp?2015/3/3/220/162027


  Introduction Top


The pass mark (PM) in educational testing is the standard criterion that determines whether a student passes or fails an examination. This determines whether the student is considered competent enough or not. Accordingly, the PM and the procedures used for its setting depend on a number of legal, professional, theoretical, and psychometric issues. [1],[2],[3],[4],[5],[6]

Several methods have been proposed for setting a PM. However, Angoff's method is the preferred method and the most often used. [2],[5],[7],[8],[9] It is the most popular method for multiple-choice questions. [6] It can be used for both medium and high stake examinations and also appropriate for an OSCE [10],[11],[12] or even testing by the computer. [13]

The Angoff method was developed from extensive research on a footnote to a chapter of a book written by Angoff. [7] This criterion reference method suggests how judgments about minimally competent students can be used to set a cut-off score.

The Angoff method involves asking judges to estimate the probability of a minimally competent student's ability to answer each item on a test correctly. [14],[15],[16],[17] Application of Angoff's method depends on the definition of the minimally competent student and the raters' (Judges) judgments about the questions.

This definition of a minimally competent student forms the basis of any judgment about setting PM. A common procedure is to allow the raters to determine a definition as a group and have them all use this definition to make their judgments. [18] In other cases, a preexisting definition is given to the raters, or the raters are asked to define the minimally competent student independently. The latter method eventually results in a wide divergence of the judges' estimates. [5],[18]

The other strength of Angoff's method is the raters. The raters should be familiar with Angoff's method, the student, the curriculum [9] and the course being assessed. [19],[20] Many modifications have been made on the original method with regard to the raters. [8],[21],[22] A common modification is to allow the raters to discuss their estimates with each other [23],[24] although this has many drawbacks [15],[16] such as the dominance of one rater on the committee. Another modification is iteration of estimations. [23] These modifications have been applied to increase the reliability of the rating (judgments) by increasing the intra and inter-rater's consistency [23],[24] and reducing variability among raters and the cut-off score. [23] By increasing the reliability of the judgments and reducing variability, the degree of error in the resulting PM is reduced. [16]

Few studies have considered the selection of raters, [4],[25] training [26],[27] and their interaction. [23],[28] This study aims to investigate the effect of the selection of raters in Angoff's method on the suggested PM.


  Materials and methods Top


This study was conducted in the Department of Anatomy, College of Medicine, King Khalid University. The College of Medicine adopted the traditional curriculum in teaching medicine in 12 semesters. According to the university regulations, the examination PM is 60. The examination sample used in this study was a final exam on Anatomy given to semester four students (March 2014). Standard settings were followed during preparation and administration of the examination. The examination consisted of 55 MCQs of Type A variety and the time allowed was 2 h.

This article aimed at studying the impact of academic degree and experience on the PM arrived at.

Staff members in this study were chosen according to standard settings. All were familiar with both the students and the curriculum. The staff members (raters) were categorized as associate professor (ST3), assistant professor (ST2) and lecturer (ST1). There were 15 raters in total, five in each category. They had a teaching experience of 15 ± 2.0, 12 ± 2.1 and 26 ± 3.5 years for ST1, ST2, and ST3 respectively. These staff members underwent a short training course on the Angoff's method and the setting of PM and a committee formed from each category of raters whose task it was to set a PM. The final committee formed of all raters had to set a final PM.

The raters were instructed that the minimally competent student could not have 100% estimation of answering the questions correctly nor <25% on a question. [29] The differences in raters' estimations were accepted within 30% or at or below 10 units of standard deviation of estimations for each question.

The raters were given the exam and asked to individually estimate how a minimally competent student would perform on the questions. The raters had a meeting to discuss the estimations and reach a consensus. The PM was calculated from the mean of the estimations.

The raters' estimations were calculated. The degree of agreement among raters, the inter-raters agreement was calculated by Kappa statistic.

To calculate the percentage of high estimations (HEs) of each category of raters, any two equal HEs were omitted. The remaining estimations were 48 out of 55 of the total number of questions. The percentage of HE for each category was calculated.

The PMs for each committee of raters' and from all raters' committees were calculated. The PMs were calculated from the means of raters' estimations [Figure 1]. The final PM of all raters' committees was found to be 58.9 out of 100.
Figure 1: Shows the pass marks and percentage of high estimation of raters and the success rates of the students.PM – Pass mark; FS – Fixed standard; SR – Success rate; HE – High estimations

Click here to view


The PM credibility was determined by comparing the PMs obtained by Angoff's method to the fixed pass mark (FPM) and norm-referenced (NR) PMs. [10],[30],[31]

Raters' estimations were analyzed statically and the results presented as mean ± standard deviation. Differences, correlations inter-rater's agreement were evaluated (SPSS for windows version 15.0. Armonk, NY: IBM Corp, USA).

The present study discusses the impact of raters' academic degrees and the experience on the resulting PM.


  Results Top


The number of estimations recorded from all categories' of raters (15 raters) for the 55 questions was 825. The percentage of the HE was calculated for each category. The ST3 committee had the highest percentage of HEs (45.5%) and the ST1 committee the lowest (16.4%). The percentage of HEs increased in association with both academic degrees and experience [Figure 1].

Kappa statistic was used to determine agreement between raters. High percentages of agreement were recorded between the committees of ST2 and ST3, then ST1 and ST3 and the lowest was between ST1 and ST2 [Table 1].
Table 1: Inter-rater's agreement (Kappa statistic; CI: 95%)*

Click here to view


There was strong correlation between the committees of ST3 and ST1 (0.771) and to a lesser extent between ST2 and ST1 (0.529) and ST2 and ST3 (0.473) respectively [Table 2].
Table 2: Correlation between rater's estimations

Click here to view


All PMs were calculated out of 100. AR PM was 58.9. The PMs of ST3, ST2 and ST1 were 61.8, 58.4 and 58.1 respectively. One way ANOVA test showed a non-significant difference between all PMs [Table 3]. The committees of ST3 and ST2, ST3 and ST1, ST2 and ST1 and AR ended with PMs of 60.1, 60, 58.3 and 58.9 respectively.
Table 3: One way ANOVA test of rater's estimations

Click here to view


Paired sample test showed a significant difference between FPM, NR and final committee (AR) PMs (P < 0.05).


  Discussion Top


Many modifications were made and applied to the original Angoff's method. Most of these were directed towards reaching a better agreement among raters, but a few were directed at the raters, selection of raters and interaction. This study investigated raters' selection and its effect on the resulting PM.

The raters in this study had varying academic degrees and experiences and were of a higher level than students. Being involved in teaching they were considered qualified. Norcini [32] emphasizes on the importance of a mixed committee more than number. The committee had to include different professional roles and a balance of personal attributes including gender, race and age. [30],[32] Committees in the present study differed in the academic degrees they had, background and age. This mix of members had no conflict of interest. According to Verhoeven et al., [33] the difference in backgrounds and expertise can offset the influence of small number of raters in the committee. In the literature, the number of raters in the committees varied [34] from a few (5-10), [20] (10-15), [35] to many (5-30), [36] to as many as possible [37] and even by using the root mean squared error to determine the number of raters. [38]

Verheggen et al. [39] reported that Angoff's estimates were significantly affected by the rater's ability to answer the questions correctly or give the model answers. These findings stress the importance of a careful selection of raters in Angoff's method. This implies that the judges should be selected from the group who are not only capable of conceptualizing the "minimally competent student," but also capable of answering all the items correctly, and have expertise in the domain assessed by the test. [40]

The use of recently graduated staff members as raters is justifiable in Angoff's method. [34] In the present work, the use of lecturers in estimating the PM ends with the same result. The ST1 group formed the recent consumers of the curriculum with regard to examinations whether general or specific. Since students' learning is driven by examinations [41],[42] they constitute the real curriculum. [34] This group, therefore, have the effective knowledge and can target the borderline student more accurately. The limited experience of the ST1 in teaching did not affect their estimations, as Angoff's method does not target the delivery of knowledge.

Angoff's method targets the minimally competent student as cut-off score by reaching an agreement between rates. In the present study, the committees of ST3 and ST2 had a high percentage of agreement and low correlation. Although both ST2 and ST1 and ST3 and ST2 committees had low percentage of agreement, that of ST3 and ST1 was higher. The differences in academic degrees and experience of ST3 and ST1 affected the agreement within the committee in spite of the correlations between estimations. In the present study, there were associations between high academic degrees, experience, HEs and PM with inter-raters agreement. These findings suggest that acceptable degrees of agreement can be reached by selecting a committee of raters with relevant academic degrees and experience. These findings also support the work by Verheggen et al. [39] who indicated that the rating in Angoff's method depended on the quality of the panel members.

A comparison of the percentages of agreement among committees shows that although there are big differences between ST3 and ST1 in experience, the academic degrees and significant differences in estimations, there was a strong correlation and better agreement than in ST2 and ST1. ST2 was related to both ST3 and ST1 but was more in agreement with ST3. The committees of ST2 and ST1 were closer to students, but their correlation and agreement were in the middle and low respectively. Thus, the inter-rater's agreement appeared to be affected more by experience than closeness to students.

High estimations of raters did not affect the agreement within committees. The percentage of HEs increased with both academic degree and experience. Schoon et al. [43] noted unrealistic high PMs among expert judges although Angoff's method is associated with low PMs. [44] These, consequently, had an effect on the resulting PM in the case of a committee with a single category of highly qualified expert raters.

Angoff's method and its modifications concentrated on reaching a high degree of agreement between raters without regard to the resulting PM whether low or high. Although the method has been linked to high pass rates and low PM, [44] the PM itself in Angoff's method is not of concern as it focuses mainly on both the minimally competent student and the exam. The PM of the final committee of all raters (ST3, ST2 and ST1) was 58.9 out of 100. There was no significant difference between the PMs of the final committee and the different categories of raters committees. The PM developed by Angoff's method, the FPM and the NR were not significantly different in the present work. The ST3 committee gave the highest PM, and the ST1 committee the lowest of all PMs. The present result is in accordance with a previous work by Norcini and Shea [20] who indicated that different groups of experts set the same standard for the same test material, and that a committee of expert raters set an unrealistic high PM. [43] ST3 and ST1 committees produced a medium PM. The PM correlated positively with both the academic degree and the experience of the raters.


  Conclusion Top


The present study showed that agreement can be reached by a selection of raters with the same or similar qualifications and experience. Moreover, the percentage of HEs and the PM increased with an increase in the academic degree. A committee of raters with high academic degrees and experience resulted in high PMs and vice versa. Thus, the mode of committee selection can alter the resulting PM.

The selection of raters for Angoff's method should include raters with different academic degrees and experience to arrive at an agreement. This method of selection will produce a reasonable PM by means of a satisfactory agreement.


  Acknowledgments Top


The author acknowledge the effort of the raters who participated in the study. Great appreciation was to Dr. S. Bashir, Dr. O. Elfaki, Prof. J. Haidera and Prof. M. Habieb for their comments. Great thanks to Mr. Abid MK for the statistical analysis and the helpful comments. The comments of Dr. El. Mekki A are highly appreciated. Special thanks to Prof. M. Atiff. College Dean and Administration of the College of Medicine (KKU, KSA) are appreciated for help and allowing the use of facilities.

 
  References Top

1.
Biddle RE. How to set cut off scores for knowledge tests used in promotion, training, certification, and licensing. Public Pers Manag 1993; 22:63-80. . Last access November 15, 2014.  Back to cited text no. 1
    
2.
Cascio WF, Alexander RA, Barrett GV. Setting cutoff scores: Legal, psychometric, and professional issues and guidelines. Pers Psychol 1988;41:1-24.  Back to cited text no. 2
    
3.
Cizek GJ. Reconsidering standards and criteria. J Educ Meas 1993;30:93-106.  Back to cited text no. 3
    
4.
Kane M. Validating the performance standards associated with passing scores. Rev Educ Res 1994;64:425-61.  Back to cited text no. 4
    
5.
Maurer TJ, Alexander RA. Methods of improving employment test critical scores derived by judging test content: A review and critique. Pers Psychol 1992;45:727-62.  Back to cited text no. 5
    
6.
Ahn DS, Ahn S. Reconsidering the cut score of Korean National Medical Licensing Examination. J Educ Eval Health Prof 2007;4:1.  Back to cited text no. 6
    
7.
Angoff W. Scales, norms, and equivalent scores. Educational Measurement: Theories and Applications. Vol. 2Edictional testing services. Princeton, New Jersey 1996. p. 121.  Back to cited text no. 7
    
8.
Berk RA. A consumer's guide to setting performance standards on criterion-referenced tests. Rev Educ Res 1986;56:137-72.  Back to cited text no. 8
    
9.
Impara JC, Plake BS. Teachers' ability to estimate item difficulty: A test of the assumptions in the Angoff standard setting method. J Educ Meas 1998;35:69-81.  Back to cited text no. 9
    
10.
Kaufman DM, Mann KV, Muijtjens AM, van der Vleuten CP. A comparison of standard-setting procedures for an OSCE in undergraduate medical education. Acad Med 2000;75:267-71.  Back to cited text no. 10
    
11.
Boursicot KA, Roberts TE, Pell G. Using borderline methods to compare passing standards for OSCEs at graduation across three medical schools. Med Educ 2007;41:1024-31.  Back to cited text no. 11
    
12.
Senthong V, Chindaprasirt J, Sawanyawisuth K, Aekphachaisawat N, Chaowattanapanit S, Limpawattana P, et al. Group versus modified individual standard-setting on multiple-choice questions with the Angoff method for fourth-year medical students in the internal medicine clerkship. Adv Med Educ Pract 2013;4:195-200.  Back to cited text no. 12
    
13.
Siriwardena AN, Dixon H, Blow C, Irish B, Milne P. Performance and views of examiners in the Applied Knowledge Test for the nMRCGP licensing examination. Br J Gen Pract 2009;59:e38-43.  Back to cited text no. 13
    
14.
Reilly RR, Zink DL, Israelski EW. Comparison of direct and indirect methods for setting minimum passing scores. Appl Psychol Meas 1984;8:421-9.  Back to cited text no. 14
    
15.
Hurtz GM, Auerbach MA. A meta-analysis of the effects of modifications to the Angoff method on cutoff scores and judgment consensus. Educ Psychol Meas 2003;63:584-601.  Back to cited text no. 15
    
16.
Ricker KL. Setting cut-scores: A critical review of the Angoff and modified Angoff methods. Alberta J Educ Res 2006;52:53-64.  Back to cited text no. 16
    
17.
Cizek GJ, Bunch MB. Standard setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Okas. SAGE Publications Ltd.; 2007.  Back to cited text no. 17
    
18.
Fehrmann ML, Woehr DJ, Arthur W. The Angoff cutoff score method: The impact of frame-of-reference rater training. Educ Psychol Meas 1991;51:857-72.  Back to cited text no. 18
    
19.
Norcini JJ. Research on standards for professional licensure and certification examinations. Eval Health Prof 1994;17:160-77.  Back to cited text no. 19
    
20.
Norcini J, Shea J. The reproducibility of standards over groups and occasions. Appl Meas Educ 1992;5:63-72.  Back to cited text no. 20
    
21.
Hambleton RK. Setting performance standards on educational assessments and criteria for evaluating the process. Setting Performance Standards: Concepts, Methods, and Perspectives. Mahwah, NJ: Lawrence Erlbaum Publishers. 2001. p. 89-116.   Back to cited text no. 21
    
22.
Yudkowsky R, Downing SM, Popescu M. Setting standards for performance tests: A pilot study of a three-level Angoff method. Acad Med 2008;83:S13-6.  Back to cited text no. 22
    
23.
Busch JC, Jaeger RM. Influence of type of judge, normative information, and discussion on standards recommended for the National Teacher Examinations. J Educ Meas 1990;27:145-63.  Back to cited text no. 23
    
24.
Hambleton RK, Plake BS. Using an extended Angoff procedure to set standards on complex performance assessments. Appl Meas Educ 1995;8:41-55.  Back to cited text no. 24
    
25.
Truxillo DM, Donahue LM, Sulzer JL. Setting cutoff scores for personnel selection tests: Issues. Illustrations, and recommendations. Hum Perf 1996;9:275-95.  Back to cited text no. 25
    
26.
Plake BS, Impara JC, Irwin PM. Consistency of Angoff-based predictions of item performance: Evidence of technical quality of results from the Angoff standard setting method. J Educ Meas 2000;37:347-55.  Back to cited text no. 26
    
27.
Plake BS, Impara JC. Ability of panelists to estimate item performance for a target group of candidates: An issue in judgmental standard setting. Educ Assess 2001;7:87-97.  Back to cited text no. 27
    
28.
Chang L. Judgmental item analysis of the Nedelsky and Angoff standard-setting methods. Appl Meas Educ 1999;12:151-65.  Back to cited text no. 28
    
29.
Wheaton A, Parry J, editors. Using the Angoff Method to Set Cut Scores. Users Conference; 2012. Available from: https://www.questionmark.com/us/seminars/Documents/webinar_anoff_handout_may_2012.pdf. last access April 23 2012.   Back to cited text no. 29
    
30.
George S, Haque MS, Oyebode F. Standard setting: Comparison of two methods. BMC Med Educ 2006;6:46.  Back to cited text no. 30
    
31.
Bhandary S. Standard setting in health professions education. Kathmandu Univ Med J 2012;9:3-4.  Back to cited text no. 31
    
32.
Norcini JJ. Setting standards on educational tests. Med Educ 2003;37:464-9.  Back to cited text no. 32
    
33.
Verhoeven BH, Verwijnen GM, Muijtjens AM, Scherpbier AJ, van der Vleuten CP. Panel expertise for an Angoff standard setting procedure in progress testing: Item writers compared to recently graduated students. Med Educ 2002;36:860-7.  Back to cited text no. 33
    
34.
Verhoeven BH, van der Steeg AF, Scherpbier AJ, Muijtjens AM, Verwijnen GM, van der Vleuten CP. Reliability and credibility of an Angoff standard setting procedure in progress testing using recent graduates as judges. Med Educ 1999;33:832-7.  Back to cited text no. 34
    
35.
Hurtz GM, Hertz NR. How many raters should be used for establishing cutoff scores with the Angoff method? A generalizability theory study. Educ Psychol Meas 1999;59:885-97.  Back to cited text no. 35
    
36.
Zieky M, Livingston SA. Manual for setting standards on the basic skills assessment tests. Princeton, NJ: Educational Testing Service; 1977. p. 235.  Back to cited text no. 36
    
37.
Cizek GJ. An NCME instructional module on: Setting passing scores. Educ Meas Issues Pract 1996;15:20-31.  Back to cited text no. 37
    
38.
Fowell SL, Fewtrell R, McLaughlin PJ. Estimating the minimum number of judges required for test-centred standard setting on written assessments. do discussion and iteration have an influence? Adv Health Sci Educ Theory Pract 2008;13:11-24.  Back to cited text no. 38
    
39.
Verheggen MM, Muijtjens AM, Van Os J, Schuwirth LW. Is an Angoff standard an indication of minimal competence of examinees or of judges? Adv Health Sci Educ Theory Pract 2008;13:203-11.  Back to cited text no. 39
    
40.
Jaeger RM. Selection of judges for standard setting. Educ Meas Issues Pract 1991;10:3-14.  Back to cited text no. 40
    
41.
Newble DI, Jaeger K. The effect of assessments and examinations on the learning of medical students. Med Educ 1983;17:165-71.  Back to cited text no. 41
[PUBMED]    
42.
Frederiksen N. The real test bias: Influences of testing on teaching and learning. Am Psychol 1984;39:193.  Back to cited text no. 42
    
43.
Schoon CG, Gullion CM, Ferrara P. Bayesian statistics, credentialing examinations, and the determination of passing points. Eval Health Prof 1979;2:181-201.  Back to cited text no. 43
    
44.
Wayne DB, Fudala MJ, Butter J, Siddall VJ, Feinglass J, Wade LD, et al. Comparison of two standard-setting methods for advanced cardiac life support training. Acad Med 2005;80:S63-6.  Back to cited text no. 44
    


    Figures

  [Figure 1]
 
 
    Tables

  [Table 1], [Table 2], [Table 3]



 

Top
 
 
  Search
 
Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

 
  In this article
   Abstract
  Introduction
   Materials and me...
  Results
  Discussion
  Conclusion
  Acknowledgments
   References
   Article Figures
   Article Tables

 Article Access Statistics
    Viewed1333    
    Printed30    
    Emailed0    
    PDF Downloaded147    
    Comments [Add]    

Recommend this journal


[TAG2]
[TAG3]
[TAG4]