Introduction
It is widely acknowledged that the purpose of diagnostic testing is to
reduce diagnostic uncertainty (e.g. by 0%, if the test is useless , or
up to 100%, when the test is perfect) 1. However, the
current metrics of diagnostic performance [i.e. sensitivity (S),
specificity (C), positive and negative likelihood ratios (LR+; LR-),
diagnostic odds ratio (DOR), and area under curve (AUC)] cannot
provide a direct assessment of the amount by which diagnostic
uncertainty is reduced. Despite lacking this crucial clinical
usefulness, these “traditional” diagnostic metrics are widely used as
the preferred evidence-based medicine (EBM) diagnostic test measures2,3.
Meanwhile, there is a long tradition of quantifying diagnostic test
performance in the field of information theory 4 .
Although, conceptually speaking, the problems associated with medical
diagnostic testing are similar to the problems faced in communication
and information theory, for some reasons the field of EBM diagnostics
has not embraced measures typically found in information theory.
One such measure, mutual information (MI) 5, used to
evaluate association between two random variables, is considered the
best metric to quantify diagnostic uncertainty and therefore test
performance. 6 It has been used in a number of studies
in medicine to explain the relationship between test results and disease
states 7-14. Yet it has been surpassingly missing from
the EBM literature.
The most significant properties that establish MI as superior to
traditional measures of diagnostic performance can be summarized as
follows:
- MI quantifies the average amount of information that can be obtained
about the value of a random variable (i.e. probability of disease
before the diagnostic test) provided the value of another random
variable is available (i.e. probability of disease after the
diagnostic test) 15;
- MI quantifies the expected value of the amount of information a
diagnostic test provides about the disease state, i.e. it takes into
account all possible states that can be associated with the test
results weighted by the likelihood of disease 16,17.
This number is particularly useful when comparing different diagnostic
tests;
- MI summarizes test performance with a single meaningful number that
corresponds to the average amount of information obtained by the
diagnostic test and unlike the ROC it does not require a specified
diagnostic cut-off point (threshold). The larger the MI value is, the
greater the amount of diagnostic uncertainty reduced through the
diagnostic test;
- MI can be applied to situations in which different test results are
associated with different probabilities of disease6,16;
- Unlike ROC and AUC, MI can be applied to a broad spectrum of testing
situations ranging from the simple binary case (two test results and
two disease states) to much more complicated situations in which a
large number of test results (or a continuum of test results) are
associated with multiple possible disease states7-14;
- The maximal value of MI, formally referred to as channel capacity, can
be used to identify the range of disease prevalence at which a
diagnostic test is most useful;
- One way MI expresses information is in bits that range from 0 to
infinity. In the simplest, binary case, where we are concerned if
disease is present or not, the maximum number of bits is equal to 16;
- Finally, the relative expression of MI indicates the percentage of
diagnostic information that can be reduced by a diagnostic test and it
can range from 0% (a useless test) to 100% (a perfect test).
In this paper, we promote the notion that MI is a better measure for
evaluation of diagnostic performance 8, both on
theoretical and practical grounds. We extent the current work by
explaining how MI can be meta-analyzed, and provide two illustrative
examples of diagnostic test meta-analysis using MI.