Screening for alcohol problems: what makes a test effective?

Alcohol Research & Health, Wntr, 2004 by Scott H. Stewart, Gerard J. Connors

Screening tests for AUDs can be made more specific by increasing the cutoff point used to define a positive test. For example, when the cutoff value for "hazardous use" in the AUDIT is increased from 8 points to 10 points, a greater proportion of people without a drinking problem will have negative screening results. But because a higher cutoff value also leads to more negative screening results in people who actually meet the diagnostic criteria for hazardous use, raising the cutoff score would simultaneously reduce the test's sensitivity. Therefore, it is important to balance the sensitivity and specificity of a test, as described below.

Overall Accuracy

Accuracy is another measure of a screening test's validity but is less useful than sensitivity and specificity. Accuracy is defined as the proportion of people correctly classified by the test. In other words, it is the ratio of the sum of true positives and true negatives over the entire study population (see figure 1). The usefulness of accuracy in characterizing a test is limited by the fact that it is not an inherent characteristic of the test but varies with the prevalence of a disorder in a population (i.e., the higher the prevalence, the greater the accuracy). In most populations, the prevalence of AUDs is significantly less than 50 percent. With this prevalence rate, overall accuracy is almost equal to specificity and does not provide additional value in estimating the validity of a screening test (Alberg et al. 2004). Therefore, it is preferable to use sensitivity and specificity to determine a test's validity.

BALANCING SENSITIVITY AND SPECIFICITY

As the discussion in the previous section indicated, for an ideal screening test both sensitivity and specificity would be close to 1, so that most people are classified correctly and only a few would have a misleading test result. In practice, however, this rarely is the case, and striking a balance between sensitivity and specificity is necessary. For example, as mentioned earlier, lowering the cut-off score for a positive test result on the AUDIT from 8 to 4 points can increase the test's sensitivity--that is, the number of people with a drinking problem classified as having a positive test result would go up. But because the increase in positive tests would include not only people who actually meet the criteria for hazardous alcohol use (i.e., are true positives) but also some who do not meet those criteria (i.e., are false positives), it also would mean a decrease in the test's specificity (see figure 2).

So how is it possible to choose an appropriate cutoff score for differentiating a positive from a negative result on a screening test? The answer depends on the relative consequences of false positive versus false negative tests--that is, is it more harmful to the individual or to society as a whole if a person is wrongly classified as having a drinking problem, or if the person is wrongly classified as not having a drinking problem?

The trade-off between sensitivity and specificity often is illustrated using a type of graphic called a receiver operator characteristic (ROC) curve (see the sidebar "Receiver Operator Characteristic [ROC] Curves"). ROC curves plot the number of true positives (expressed as the sensitivity of a test) on the y-axis against the number of false positives (expressed as 1 minus the specificity of the test) on the x-axis at different cutoff scores. The resulting graph can help clinicians and researchers identify the cutoff value with the best possible combination of specificity and sensitivity for a given test. For example, researchers have used an ROC curve to identify an optimal cutoff score for the AUDIT when screening for "at-risk" drinking (2) in a primary care setting (Volk et al. 1997). When screening for at-risk drinking in this study population, a cutoff score of 4 provided roughly equal sensitivity and specificity (i.e., balanced false positives and false negatives) and maximized accuracy. It is important to note, however, that studies designed to validate the AUDIT in other populations and for other drinking behavior categories typically have selected higher cutoff scores as optimal for their conditions. Accordingly, it is essential to validate screening tests for a specific disorder or group of disorders in populations that are similar to the populations that will be screened using those tests. Whether a test's validity has been adequately established for a specific population is often a matter of judgment.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale