Evidence-based medicine: how to critically appraise the literature (Proceedings)
A preceding presentation served to introduce the topic of evidence-based medicine (EBM). For practicing EBM, critically appraising results of searching the literature for evidence is arguably the most important step. This presentation will introduce the principles for critically appraising patient-based reports .
Critical appraisal of reports entails 3 fundamental steps: 1) determining if study results are valid; 2) assessing the clinical importance of study findings; and, 3) assessing if the results of valid, clinically important studies are relevant to our patients. A foundation for applying these 3 steps is a hierarchy of study types for EBM, which places a premium on those that are patient-based. Clinical importance has many interpretations, but in terms of quantification it is best assessed in reference to the magnitude of the observed association(s) in a study.
Assessing validityIn studies of patients, one generally estimates a measure of association (such as the odds ratio [OR] or relative risk [RR]) or another parameter (such as the cumulative incidence of disease). Results of a study are valid when the observed or estimated parameter is the same as the true/actual value. The term bias refers to a systematic error in the study relating to its design, data collection methods, or data analysis.1 Such systematic error is distinct from the random error that results from the imprecision of the device(s) used for collecting data. Biases in epidemiological/patient-based studies fall into 3 categories: selection bias, information bias, and confounding bias.1
Criteria that are helpful for evaluating data include the study design and the type of question being addressed (diagnosis, treatment, prognosis, or harm). Because the type of question being asked is most relevant to clinicians, we will focus on appraisal of the literature based on the primary clinical activities with which we are engaged in clinical practice: 1) choosing and interpreting diagnostic tests; 2) selecting treatments/interventions; and, 3) making prognoses. The types of evidence we use varies somewhat by each of these clinical activity.
When we appraise an article that relates to a diagnostic test, there are 3 critical aspects to evaluate: 1) the spectrum of disease represented by the patients studied; 2) if the "gold standard" test was applied irrespective of the results of the diagnostic test being evaluated; and, 3) whether the "gold standard" was measured independently of the other test.2,3
It is common for studies of the performance of diagnostic tests to be assessed using severe forms of disease (e.g., necropsy-confirmed cases of sepsis) and horses free of signs of disease. Although use of such case-control studies is useful for initial evaluation of tests, this design is of limited value with respect to clinical application. Evaluation of diagnostic tests must encompass the full spectrum of disease to which the test will be applied; thus, patients must be included with milder as well as florid forms of the disease, in early as well as late stages of disease, and among both treated and untreated patients. Case-control studies are generally weak sources of evidence for evaluating diagnostic tests. Prospectively designed studies of consecutively enrolled patients who undergo pre-specified diagnostic testing criteria against a reference standard that is consistently applied are the best sources of evidence for evaluating diagnostic tests. Studies of non-consecutive patients provides weaker evidence because there is potential for bias in the selection of cases that are included.
When a patient has a negative test, investigators may be tempted to forego testing with the reference standard, especially when the latter is more invasive. For example, consider a study to evaluate the diagnostic sensitivity and specificity of thoracic ultrasound for detecting subclinical Rhodococcus equi pneumonia using foals at a farm with endemic R. equi pneumonia. One might not want to perform tracheobronchial aspiration to obtain a sample for microbiologic culture and cytologic evaluation in foals from the farm that appear healthy and whose thoracic ultrasound findings are normal. But failure to perform such testing introduces a bias that is an important limitation.