Interrater Reliability of a Tool Developed to Facilitate and Assess Clinical Reasoning in Musculoskeletal Cases
Purpose: Currently, no universally accepted standards exist for the assessment of clinical reasoning skills in PT education. The Clinical Reasoning Appraisal For Thinking Effectively (CRAFTe) tool was developed to facilitate and assess clinical reasoning of musculoskeletal cases in an entry-level problem based curriculum. To date no psychometric testing has been done on the CRAFTe. The purpose of this pilot study was to assess the interrater reliability of the CRAFTe tool as an assessment of clinical reasoning skills in entry-level DPT students during the didactic portion of their musculoskeletal content.
Methods/Description: Interrater reliability of the CRAFTe tool was investigated using quantitative methods in a survey design. For the purposes of this study, two CRAFTe worksheets were selected from historical OSCE records by the faculty sponsor of this study. The selection was purposeful to include a range in performance on the CRAFTE worksheets based on the students’ grades. The two worksheets were de-identified and copied into the Qualtrics Survey Analysis Software. The survey participants were given detailed case information of the standardized patient the students evaluated before completing the CRAFTe. Each participant then answered 11 questions for each CRAFTE worksheet. The 11 questions replicate the rubric for the CRAFTe. Each question was written in the form of a statement with an accompanying Likert Scale, which was then used to overlay a numerical scale (scored 0-5, representing strongly disagree to strongly agree, respectively) to quantify the agreeance between the statement and the grader's interpretation of the students' reasoning. All surveys included basic demographic information (age, sex, year as a licensed PT, specialty areas, etc.). The 15 graders were given 2 months to complete the survey before it was closed. Descriptive statistics including mean, median, mode, and standard deviation were captured and interrater reliability was calculated using a two-way random, consistency, multiple raters/measurement intraclass correlation coefficient [ICC (2,k)].
Results/Outcomes: Statistical analysis was performed on the eight of the 15 subjects who completed the survey before it was closed. Seven subjects either declined to participate or did not complete the entire survey and were excluded from analysis. Upon looking at the demographic data and comparing aggregate scores between raters, we noticed that 2 of the raters showed high levels of discrepancy or variance. Demographic data indicated that one of these raters worked in pediatrics and the other in an acute care hospital setting. Initially, the ICC (2,k) was calculated using all 8 raters regardless of specialty or work setting. This resulted in a single measure of true variance ICC value of .429. This indicates poor reliability as a single measure but when using the average measure of true variance, the ICC value increases to .857, which indicated good reliability although with a wide 95% confidence interval of .225, 1.00. Interrater reliability using Krippendorfs Alpha for all 8 raters indicated poor reliability with an alpha of .28.
The same statistical analysis was run again excluding subjects without adult orthopedic experience (n=6). The 6 raters with adult orthopedic experience produced a single measure ICC value of .774 and an average measure ICC value of .954 with a 95% confidence interval of .698, 1.00 which shows excellent interrater reliability. Using the same exclusion criteria, we ran Krippendorfs Alpha with the 6 raters and found there to be moderate reliability with alpha equal to .65.
Conclusions/Relevance to the conference theme: These preliminary results provide initial support for the use of the CRAFTe as a reliable measurement tool for assessment when graded by faculty with prior adult orthopedic experience. An important outcome of this study is the discrepancy in grading by raters whose specialty lie outside the orthopedic setting. After exclusion of raters without adult orthopedic experience, the results offer support for the use of the CRAFTe as a reliable assessment tool. Further reliability and validity study on the CRAFTe tool is warranted.
Moving forward, there is a significant need to develop a body of literature exploring the validity of tools intended to facilitate and assess clinical reasoning. As we set sail on this scholarly exploration it is important to recognize that there will not be a single tool which can encompass all types of clinical reasoning. Further tool development should consider specialty/setting as well as clinical reasoning strategies (diagnostic, procedural, ethical, etc.) that it is intending to capture. An important limitation of the CRAFTe is that it is specific to facilitating and assessing clinical reasoning in cases presenting with musculoskeletal pain problems. It is not suitable for other types of cases. Additional reliability and validity testing in a full spectrum study including both PT students and clinicians is warranted.