How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.
In any nascent field, the development of best practices is critical to ensure the reliability, reproducibility and safety of new devices. A recent analysis by Stanford researchers reviewed the evaluation process for all 130 FDA-approved medical AI devices. The authors found that 126 of 130 such devices underwent only retrospective studies, and that none of the 54 “high-risk” devices were tested in prospective studies. Furthermore, most of the computer-aided diagnostic devices did not include a side-by-side comparison of doctor performance with and without AI, which as the authors note is a critical factor in the devices’ intended use. Most (93/1130) devices did not undergo multi-site assessment, and 59/130 devices did not include a sample size of the studies used. The authors go on to demonstrate in a case study of a pneumothorax detection software that there is substantial variability in device performance when tested across multiple clinical sites. The authors call for more prospective studies of AI-enabled devices that are measured against standard-of-care, as well as increased post-market surveillance in this new field.
Wu E, et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. 2021 Apr;27(4):582-584.