Discussion
Many authors have outlined a number of problems with the use of “traditional” measures of diagnostic performance6,16,25-27. These problems relate to the biases that plague studies evaluating diagnostic studies, and to the metrics themselves 28. In this paper, we focus on the latter. In particular, we focus on the measurement of diagnostic accuracy as opposed to the impact of diagnostic tests on health outcomes, which depends on consideration of down-stream effects of testing such as the choice of treatment and will not be considered here.
With regards to diagnostic accuracy, it has been argued6,8,29,30 that utilization of information theory, and particularly MI, has theoretical and practical advantages over the traditional measures at assessing the performance of a diagnostic test. Notably, MI and RMI can be used to explicitly quantify the amount of diagnostic uncertainty a test reduces. Such a direct measure can easily be used to evaluate test performance not only by trained researchers but also by any EBM literate practitioner. Here, we summarized the MI advantages over traditional measures and demonstrated how MI can be meta-analyzed using two cases from the literature.
The MI meta-analysis results presented in both cases show the superiority of MI and RMI over other metrics in conveying arguably the most useful clinical indicators of diagnostic test performance, namely the amount of diagnostic uncertainty reduced by the test. Clearly, consideration of other ethical and personal dilemmas is also involved in the administration of a diagnostic test. However, for the EBM community and the evidence synthesis practitioners , reduction of uncertainty is of outmost importance. In terms of derivation, MI is easily computed and meta-analyzed. In addition, although we have not emphasized it here, MI has particular advantages over other metrics when it comes to analysis of tests with continuous measurements such as PSA, blood pressure etc. Analysis of such tests with traditional metrics requires dichotomization of the test results discarding useful information 31. On the other hand, MI can be computed both for discrete and continuous variables 32.
One limitation of MI is its reliance on prevalence, which even though represents theoretical advantages it introduces heterogeneity in meta-analysis. To solve this problem, we propose meta-analyzing RMI instead of MI, but at this time we know of no derivation of standard error for RMI. Further development in the field of research synthesis of diagnostic test performance may lie in the opportunity to develop robust meta-analytic techniques for RMI.
In summary, we believe that MI is the most meaningful measure for both decision makers and EMB researchers as it provides intuitive, easy to understand metrics that quantify diagnostic tests information content. We therefore, argue that the field of evidence-based diagnostics should adopt MI as its most useful metric.
References
1. Sox HC, Blatt MA, Higgins MC, Marton MC. Medical Decision Making. Boston: Butterworths; 1988.
2. Leeflang MM, Deeks JJ, Takwoingi Y, Macaskill P. Cochrane diagnostic test accuracy reviews. Syst Rev 2013;2:82.
3. Leeflang MM, Deeks JJ, Gatsonis C, Bossuyt PM, Cochrane Diagnostic Test Accuracy Working G. Systematic reviews of diagnostic test accuracy. Ann Intern Med 2008;149:889-97.
4. Shannon CE, Waever W. The mathematical theory of communication. Urbana: The University of Illinois Press; 1962.
5. Shannon C. A mathematical theory of communication, bell System technical Journal 27: 379-423 and 623–656. Mathematical Reviews (MathSciNet): MR10, 133e 1948.
6. Benish WA. Intuitive and axiomatic arguments for quantifying diagnostic test performance in units of information. Methods Inf Med 2009;48:552-7.
7. Somoza E, Mossman D. Comparing and optimizing diagnostic tests: an information-theoretical approach. Med Decis Making 1992;12:179-88.
8. Benish W. Mutual information as an index of diagnostic test performance. Methods of information in medicine 2003;42:260-4.
9. Mossman D, Somoza E. Diagnostic tests and information theory. J Neuropsychiatry Clin Neurosci 1992;4:95-8.
10. Somoza E, Soutullo-Esperon L, Mossman D. Evaluation and optimization of diagnostic tests using receiver operating characteristic analysis and information theory. International journal of bio-medical computing 1989;24:153-89.
11. Benish W. The use of information graphs to evaluate and compare diagnostic tests. Methods of information in medicine 2002;41:114-8.
12. Nelson GW, O’Brien SJ. Using mutual information to measure the impact of multiple genetic factors on AIDS. JAIDS Journal of Acquired Immune Deficiency Syndromes 2006;42:347-54.
13. Meyer CR, Boes JL, Kim B, et al. Demonstration of accuracy and clinical versatility of mutual information for automatic multimodality image fusion using affine and thin-plate spline warped geometric deformations. Medical image analysis 1997;1:195-206.
14. Diamond GA, Hirsch M, Forrester JS, et al. Application of information theory to clinical diagnostic testing. The electrocardiographic stress test. Circulation 1981;63:915-21.
15. Cover TM, Thomas JA. Elements of information theory: John Wiley & Sons; 2012.
16. Hughes G. Application of Information Theory to Epidemiology: American Phytopathological Society; 2012.
17. Hughes G, McRoberts N. The structure of diagnostic information. Australasian Plant Pathology 2014:1-20.
18. Djulbegovic B, Hozo I, Abdomerovic I, Hozo S. Diagnostic entropy as a function of therapeutic benefit/risk ratio. Med Hyoptheses 1995;45:503-9.
19. Djulbegovic B, Glasziou P, Chalmers I. The importance of randomised vs non-randomised trials. The Lancet 2019;394:634-5.
20. Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta‐analysis. Systematic Reviews in Health Care: Meta-Analysis in Context, Second Edition 2001:285-312.
21. Roulston MS. Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena 1999;125:285-94.
22. Deeks JJ. Systematic reviews in health care: Systematic reviews of evaluations of diagnostic and screening tests. BMJ 2001;323:157-62.
23. Smith-Bindman R, Kerlikowske K, Feldstein VA, et al. Endovaginal ultrasound to exclude endometrial cancer and other endometrial abnormalities. JAMA 1998;280:1510-7.
24. Menke J, Larsen J. Meta-analysis: Accuracy of contrast-enhanced magnetic resonance angiography for assessing steno-occlusions in peripheral arterial disease. Ann Intern Med 2010;153:325-34.
25. Knottnerus JA. The evidence base of clinical diagnosis. London: BMJ Books; 2002.
26. Hilden J. The area under the ROC curve and its competitors. Med Decis Making 1991;11:95-101.
27. Lee WC, Hsiao CK. Alternative summary indices for the receiver operating characteristic curve. Epidemiology 1996;7:605-11.
28. Bossuyt PM, Reitsma JB, Bruns DE, et al. The STARD Statement for reporting of studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7-18.
29. Benish WA. Relative entropy as a measure of diagnostic information. Medical decision making 1999;19:202-6.
30. Wu Y, Alagoz O, Ayvaci MU, et al. A comprehensive methodology for determining the most informative mammographic features. Journal of digital imaging 2013;26:941-7.
31. Shapiro DE. The interpretation of diagnostic tests. Stat Methods Med Res 1999;8:113-34.
32. Ross BC. Mutual Information between Discrete and Continuous Data Sets. PloS one 2014;9:e87357.
Appendix - Unabridged derivations of MI, RMI and Var(MI)
Entropy is expressed as:
\begin{equation} H\left(D\right)=-\left(P(D+)\operatorname{}{P(D+)}+\left(1-P(D+)\right)\operatorname{}\left(1-P(D+)\right)\right)\nonumber \\ \end{equation}
The uncertainty due to the diagnostic test is:
\begin{equation} H\left(T\right)=-\left(\ P(D+|T+)\operatorname{}{P(D+|T+)}+\left(1-P(D+|T+)\right)\operatorname{}\left(1-P(D+|T+)\right)\right)\nonumber \\ \end{equation}
The mutual information is computed as:
\begin{equation} I\left(D,T\right)=H\left(D\right)+H\left(T\right)-H\left(D,T\right)=H\left(D\right)-H\left(D\middle|T\right)\nonumber \\ \end{equation}
The relative mutual information is computed as:
\begin{equation} I_{R}\left(D,T\right)=\frac{I\left(D,T\right)}{H\left(D\right)}=1-\frac{H(D|T)}{H(D)}\nonumber \\ \end{equation}
In terms of sensitivity and specificity, mutual information is derived as:
\begin{equation} I\left(D,T\right)=H\left(D\right)+H\left(T\right)-H\left(D,T\right)=P\left(T+\middle|D+\right)P(D+)\left(\log_{2}\left(\frac{P\left(T+\middle|D+\right)\left(\left(1-P\left(T+\middle|D+\right)\right)P\left(D+\right)+P\left(T-\middle|D-\right)\left(1-P\left(D+\right)\right)\right)}{\left(1-P\left(T+\middle|D+\right)\right)\left(P\left(T+\middle|D+\right)P\left(D+\right)+\left(1-P\left(T-\middle|D-\right)\right)\left(1-P\left(D+\right)\right)\right)}\right)\right)+P\left(T-\middle|D-\right)\left(1-P(D+)\right)\ \left(\log_{2}\left(\frac{P(T-|D-)\left(P(T+|D+)P(D+)+\left(1-P(T-|D-)\right)\left(1-P(D+)\right)\right)}{\left(1-P(T-|D-)\right)\left(\left(1-P(T+|D+)\right)P(D+)+P(T-|D-)\left(1-P(D+)\right)\right)}\right)\right)\ \ \ \ \ \ \ \ +P(D+)\log_{2}\left(\frac{\left(1-P(T+|D+)\right)\left(P(T+|D+)P(D+)+\left(1-P(T-|D-)\right)\left(1-P(D+)\right)\right)}{\left(1-P(T-|D-)\right)\left(\left(1-P(T+|D+)\right)P(D+)+P(T-|D-)\left(1-P(D+)\right)\right)}\right)+\log_{2}\left(\frac{\left(1-P(T-|D-)\right)}{\left(P(T+|D+)P(D+)+\left(1-P(T-|D-)\right)\left(1-P(D+)\right)\right)}\right)\nonumber \\ \end{equation}
The variance of mutual information is computed as:
\begin{equation} \text{Var}\left(H\left(D\right)\right)=\left[\left(\operatorname{}{P(D+)}+H\left(D\right)\right)^{2}+\left(\operatorname{}\left(1-P(D+)\right)+H\left(D\right)\right)^{2}\right]\frac{P(D+)\left(1-P(D+)\right)}{N}\nonumber \\ \end{equation}
and:
\begin{equation} {\text{Var}\left(I\left(D,T\right)\right)=\left(\operatorname{}\left(P\left(T+\middle|D+\right)P(D+)+\left(1-P\left(T+\middle|D+\right)\right)P(D+)\right)+\operatorname{}\left(P\left(T+\middle|D+\right)P(D+)+(1-P\left(T-\middle|D-\right))(1-P\left(D+\right))\right)-\operatorname{}{(P\left(T+\middle|D+\right)P\left(D+\right))}+I\left(D,T\right)\right)^{2}\left(\frac{P\left(T+\middle|D+\right)P(D+)\left(1-P\left(T+\middle|D+\right)P(D+)\right)}{N}\right)\backslash n}{+\left(\operatorname{}\left(P\left(T+\middle|D+\right)P(D+)+\left(1-P\left(T+\middle|D+\right)\right)P(D+)\right)+\operatorname{}\left(\left(1-P\left(T+\middle|D+\right)\right)P\left(D+\right)+P\left(T-\middle|D-\right)(1-P\left(D+\right))\right)-\operatorname{}{(\left(1-P\left(T+\middle|D+\right)\right)P\left(D+\right))}+I\left(D,T\right)\right)^{2}\left(\frac{\left(1-P\left(T+\middle|D+\right)\right)P(D+)\left(1-\left(1-P\left(T+\middle|D+\right)\right)P(D+)\right)}{N}\right)\backslash n}{+\left(\operatorname{}\left((1-P\left(T-\middle|D-\right))(1-P\left(D+\right))+P\left(T-\middle|D-\right)(1-P\left(D+\right))\right)+\operatorname{}\left(P\left(T+\middle|D+\right)P(D+)+(1-P\left(T-\middle|D-\right))(1-P\left(D+\right))\right)-\operatorname{}{((1-P\left(T-\middle|D-\right))(1-P\left(D+\right)))}+I\left(D,T\right)\right)^{2}\left(\frac{(1-P\left(T-\middle|D-\right))(1-P\left(D+\right))\left(1-(1-P\left(T-\middle|D-\right))(1-P\left(D+\right))\right)}{N}\right)\backslash n}{+\left(\operatorname{}\left((1-P\left(T-\middle|D-\right))(1-P\left(D+\right))+P\left(T-\middle|D-\right)(1-P\left(D+\right))\right)+\operatorname{}\left(\left(1-P\left(T+\middle|D+\right)\right)P(D+)+P\left(T-\middle|D-\right)(1-P\left(D+\right))\right)-\operatorname{}{(P\left(T-\middle|D-\right)\left(1-P\left(D+\right)\right))}+I\left(D,T\right)\right)^{2}\left(\frac{P\left(T-\middle|D-\right)(1-P\left(D+\right))\left(1-P\left(T-\middle|D-\right)(1-P\left(D+\right))\right)}{N}\right)}\nonumber \\ \end{equation}