A preceding presentation served to introduce the topic of evidence-based medicine (EBM). For practicing EBM, critically appraising
results of searching the literature for evidence is arguably the most important step. This presentation will introduce the
principles for critically appraising patient-based reports .
Critical appraisal of reports entails 3 fundamental steps: 1) determining if study results are valid; 2) assessing the clinical importance of study findings; and, 3) assessing if the results of valid, clinically important studies are relevant to our patients. A foundation for applying these 3 steps is a hierarchy of study types for EBM, which places a premium on those that are
patient-based. Clinical importance has many interpretations, but in terms of quantification it is best assessed in reference
to the magnitude of the observed association(s) in a study.
In studies of patients, one generally estimates a measure of association (such as the odds ratio [OR] or relative risk [RR])
or another parameter (such as the cumulative incidence of disease). Results of a study are valid when the observed or estimated
parameter is the same as the true/actual value. The term bias refers to a systematic error in the study relating to its design,
data collection methods, or data analysis.1 Such systematic error is distinct from the random error that results from the imprecision of the device(s) used for collecting
data. Biases in epidemiological/patient-based studies fall into 3 categories: selection bias, information bias, and confounding
Criteria that are helpful for evaluating data include the
type of question being addressed
(diagnosis, treatment, prognosis, or harm). Because the type of question being asked is most relevant to clinicians, we
will focus on appraisal of the literature based on the primary clinical activities with which we are engaged in clinical practice:
1) choosing and interpreting diagnostic tests; 2) selecting treatments/interventions; and, 3) making prognoses. The types of evidence we use varies somewhat by each of these clinical activity.
When we appraise an article that relates to a diagnostic test, there are 3 critical aspects to evaluate: 1) the spectrum of
disease represented by the patients studied; 2) if the "gold standard" test was applied irrespective of the results of the
diagnostic test being evaluated; and, 3) whether the "gold standard" was measured independently of the other test.2,3
It is common for studies of the performance of diagnostic tests to be assessed using severe forms of disease (e.g., necropsy-confirmed
cases of sepsis) and horses free of signs of disease. Although use of such case-control studies is useful for initial evaluation
of tests, this design is of limited value with respect to clinical application. Evaluation of diagnostic tests must encompass
the full spectrum of disease to which the test will be applied; thus, patients must be included with milder as well as florid
forms of the disease, in early as well as late stages of disease, and among both treated and untreated patients. Case-control
studies are generally weak sources of evidence for evaluating diagnostic tests. Prospectively designed studies of consecutively
enrolled patients who undergo pre-specified diagnostic testing criteria against a reference standard that is consistently
applied are the best sources of evidence for evaluating diagnostic tests. Studies of non-consecutive patients provides weaker
evidence because there is potential for bias in the selection of cases that are included.
When a patient has a negative test, investigators may be tempted to forego testing with the reference standard, especially
when the latter is more invasive. For example, consider a study to evaluate the diagnostic sensitivity and specificity of
thoracic ultrasound for detecting subclinical Rhodococcus equi pneumonia using foals at a farm with endemic R. equi pneumonia. One might not want to perform tracheobronchial aspiration to obtain a sample for microbiologic culture and cytologic
evaluation in foals from the farm that appear healthy and whose thoracic ultrasound findings are normal. But failure to perform
such testing introduces a bias that is an important limitation.