Inter- and Intra-Rater Reliability of a Comprehensive Practical Evaluation Tool
Purpose
Competency-based education includes assessing whether the skills and knowledge of students are at a level that will allow them to work as competent professionals. Per CAPTE Required Element 4N, ÒThe collective core faculty are responsible for assuring that students are safe and ready to progress to clinical education.Ó One mode of assessing this is via a comprehensive practical examination. The question that arises is whether there is consistency in grading amongst core faculty on comprehensive practical exams. Therefore, the purpose of this study was to determine the inter- and intra-rater reliability of faculty grading a comprehensive practical examination utilizing a visual analog scale with anchors and descriptors.
Methods/Description:
Six core Doctor of Physical Therapy Faculty Members, four senior and two junior, from the same academic institution participated in the study. Each faculty member viewed the same video of a student completing a comprehensive examination and independently scored the examination, utilizing a rubric established by the Department. The student was scored on a continuous scale of 0 Ð 10, with 0 being Òbeginning performanceÓ and 10 being Òbeyond entry levelÓ for each of the six evaluative items (professionalism, safety, subjective examination, evaluation, intervention, and time management). Each anchor had descriptors/required elements. The same faculty members viewed the same video a minimum of four weeks later and conducted another evaluation, utilizing the same tool. Utilizing SAS, Cronbach Ôs alpha scores were calculated for inter-rater reliability and a Pearson correlation was calculated for intra-rater reliability.
Results/Outcomes:
The CronbachÕs alpha scores were .9999, representing an excellent level of internal consistency amongst raters. The Pearson correlation coefficient was .9989, with a p value of <.0001, representing a strong intra-rater consistency.
Conclusions/Relevance to the conference theme:
The reliability of assessment tools utilized for practical examinations should be formally measured. This practical evaluative instrument, which utilized anchors and descriptors, was found to have high levels of inter- and intra-rater reliability amongst all core faculty at this Midwestern Doctor of Physical Therapy Program. This tool, with descriptors, should continue to be utilized for comprehensive practical examinations to help core faculty in assessing and determining whether students are safe and ready to enter the clinical environment.