Friday, August 10, 2012

Clinical assessment variability - what is really causing it?

There was a recent article in Academic Medicine by Dr. Alexander and colleagues from Brigham and Women's Hospital describing the amount of variability in clerkship grading among US medical schools.  They found that, unsurprisingly, the grading systems for the clinical years had really no consistency at all.  There was inconsistency among the grading systems used (traditional ABCDF or honor/pass/fail or pass/fail) - (table 1), and even within the schools which used a similar scale the percentage of students receiving the highest grade was all over the place (table 2).  So, the question is what do we do with this information?  I think no one really expected findings that were different, but now the answer is out there, in print (or on digital reader screens).

I think part of the answer to where we go from here is to decide if this article was really asking the right question.  The authors do start to talk about this in the discussion section, but I'll try to lay out my thoughts with a little different spin than they gave their discussion.  I think the real question is what are we using the assessment of the clerkship performance for?  What is the essence of what we are trying to measure?  Only when there is broad consensus not only between schools, but within the individual courses of each school will there get to be any semblance of uniformity of grading of students.  I see at least two competing interest which influence how a clerkship director decides to come up with a grading system.  The first is the idea that the students should be measured on how competent they are in the area the clerkship is grading.  In other words, when they are on call as a first-year resident or as a 50 year-old physician, do they have the knowledge and skills to assess a patient with a given problem.  Second, the clerkship director also wants to be sure that the students at their school have a fair chance to compete for selective residency programs.  Thus, there also needs to be a system to distinguish high-achieving from low-achieving students.  The first system is more about the individual student, and with this system, by definition, everyone should be able to achieve the highest score with enough effort and work.  In the second system, it is more about evaluation of the program, and the group.  In this system, it cannot be possible for everyone to achieve the highest score.  However, the system can be manipulated on both sides to aid students or to make it more hazardous.  There are benefits and risks of each system - as with anything in medicine.

I don't think these interests are necessarily incompatible, but they create a tension which I've seen in national meetings and in local curricular meetings.  I also think most clerkship directors are not aware of how this tension affects the grading system they have developed.  I think their not aware as the debates I've heard are usually about tools for assessment or the numbers of honors.  Rarely does the debate get to the level of what is our ultimate purpose for the assessment.  The answer to that question must shape how grades are assessed.  Only when we all become very clear about what we our goals are for the assessment will we truly be able to come to a place where we can have a national dialogue about how to unify the system.