Wednesday, June 6, 2012

EBM evaluation tools applied to medical student assessment tools

I remember back to the days when I was a fresh medical student taking those first classes in biochem, anatomy, and cell biology.  I learned a ton, and honestly I draw on this knowledge-base daily when I'm taking care of patients.  I also remember that the assessments methods used during my first year of medical school were not the greatest (in the opinion of a person who was teaching high school physics and chemistry 3 months before entering med school).  The number of assessments used in med schools has risen over the last 15 years since I was an M1.  However, with a rise in number of choices, comes responsibility to utilize the right choice.  Another way to look at this from an pedagogical standpoint is are the assessments really measuring the outcomes you think they are measuring.  To attempt to help the medical educator with this dilemma, I came up with the idea that you can apply a well-known paradigm used to evaluate evidence-based medicine (EBM) to evaluate a student assessment.  The EBM evaluation methods I've been most familiar with is outlined by Straus and colleagues in their book, Evidence-Based Medicine: How to Practice and Teach EBM, copyright 2005.

Here's my proposed way to assess assessment:

1)  Is the assessment tool valid?  By this we need to be sure that our measurement tool is reliable and accurate in being able to measure what we want it to measure.  The standardized (high-stakes) examinations like MCAT, USMLE and board certification examinations are expensive not because these companies are rolling in cash, but because it takes people LOTS of time to validate a test.  Hence, most home-grown tools are not completely validated (although some have been).  To be validated an assessment has to be likely to give similar results if the same learner takes the test each time.  It also has to accurately categorize the level of proficiency of the learner at the task you are measuring.

For example, let's say I have an OSCE to assess whether a learner can counsel a young woman of child-bearing age on her options for migraine prophylaxitic medications.  For my OSCE to be valid, I need to look for reliability and accuracy.  Does the OSCE predictably identify learners who do not understand that valproate has teratogenic potential, and don't discuss this with a standardized patient?  You also want to know if it is accurate, in other words does your scoring method give similar results if multiple faculty who have been trained on how to use the tool score the same student interaction?  To truly answer these questions on an assessment, it takes multiple data points for both raters and learners - hence why it takes time and money, and also why most assessments are not truly validated.

The best way to validate is to measure the assessment against another 'gold standard' assessment.  How well does your assessment work compared with known validated scales.  Unfortunately, there aren't as many 'gold standard' assessments outside of the clinical knowledge domain in medical education (although it is getting better).

2)  Is the valid assessment tool important?  Here we need to talk about whether the difference seen in the assessment is actually a real difference.  How big is the gap between those who just passed without trouble, just barely passed, and those who failed to meet the expected mark?  Medical students are all very bright, and sometimes the difference between the very top and the middle is not that great a margin (even if it looks like it on the measures that we are using).  I think the place where we trip up here sometimes is in assuming that Likert scale numbers have a linear relationship.  Is a step-wise difference from 3 to 4 t o 5 on the scale set up on the clinical evaluations a reasonable assumption, and is the difference between a 4 and a 5 really important?  It might very well be that this is true, but it will be different for every scale that we set up.  I've never been a big fan of using Likert rating scores to directly come up with a percentage point score unless you can prove to me through your distribution numbers that it is working.

3)  Is this valid, important tool able to be applied to my learners?  I think this step involves several steps.  First, are you actually measuring what you'd like to measure?  A valid, reliable tool for measuring knowledge (typical MCQ test) unless it is very artfully crafted will not likely assess clinical reasoning skills or problem-solving.  So, if your objective is to teach the learner how to identify 'red flags' in a headache patient history, is that validated MCQ the best assessment tool to use?  Is it OK that that learner can pick out 'red flags' from a list of distractors, or is it a different skill set to be able to identify this in a clinical setting?  I'm not saying MCQ's can never be used in this situation, you just have to think about it first.

Second, if you are utilizng a tool from another source and you did not design it for your particular curriculum, is the tool useful for the unique objectives?  Most of the time this is OK, and cross-fertilization of educational tools is necessary due to the time and effort bit.  But, you have to think about what you are actually doing.  In our example of the headache OSCE, let's say you found a colleague at another institution who has an OSCE set up to assess communication of differential diagnosis and evaluation to a person with migraine who is worried they have a brain tumor.  You then apply that to your clerkship, but you are more interested in the above scenario about choice of therapy.  Will the tool still work when you tweak it?  It may or may not, and you just need to be careful.

Hopefully you've survived to read through to the end of this post.  Hopefully you learned something about assessment in medical education, and you found the EBM-esque approach to assessment evaluation useful.  My concern is that in general, not enough time is spent considering these questions, and more time is spent on developing the content then on assessment.  I'm guilty of this as well, but I'm trying to get better.  Thanks for reading, and feel free to post comments/thoughts below.

No comments:

Post a Comment