NIH researchers develop AI system to improve cervical cancer screening

Dive Brief:

Researchers have created an automated image analysis system designed to improve cervical cancer screening in countries that cannot afford Pap tests and other diagnostic tools.
The system was more accurate than other screening methods, although the finding comes with caveats including a high rate of false positives and limitations of the dataset used in the study.
Despite the concerns, the NIH and Global Good researchers behind the study see potential in the technique and are working to make it suitable for point-of-care screening using digital cameras and smartphones.

Dive Insight:

Cervical cancer is a big problem for healthcare systems in low and middle-income countries. Around 90% of the 250,000 deaths caused by cervical cancer globally each year occur in such countries, in part because they lack the resources and infrastructure to implement the screening programs that catch the disease early in the West. Earlier this week, Mayo Clinic released a study suggesting cervical cancer screening rates are not as high as previous surveys have indicated.

Authorities including the World Health Organization have backed visual inspection after application (VIA) of acetic acid as a way for countries to spot potential cases of cervical cancer early. However, while the test is simple and cheap, it struggles to distinguish precancerous lesions from other abnormalities.

Identifying a need for a better test, researchers at NIH and the Bill Gates-backed fund Global Good sought to improve on early attempts to develop a machine learning-based approach to cervical cancer screening. The result is an automated image analysis system that achieved 97.7% sensitivity in a key age-based subgroup of the test set.

The system was trained on images taken in a NCI-funded study of more than 9,000 women in Costa Rica in the 1990s. Around 70% of the selected images captured by cervicography, a now discontinued visual screening technique, were used to train the system to identify precancerous lesions.

When turned on the remaining 30% of images, the system statistically outperformed cervicography and other screening tests used in the original Costa Rican study. The system was particularly sensitive in women aged 25 to 49 years old, a key screening population. In that subgroup, the system achieved a sensitivity of 98%, although hitting that high compromised other aspects of the screen.

"To achieve nearly perfect sensitivity for cases occurring up to 7 years after examination generated a large number of false positives among screened noncases. More balanced cutpoints for positivity might be chosen to limit excessive treatments, although sensitivity would drop," the researchers wrote in the Journal of the National Cancer Institute.

The false positive rate is one of several reasons to question the likely impact of the system in the real world. Other issues include the use of images captured by a small team of highly-trained nurses in a single cohort study. Given performance was affected by image quality and obstructions, the system's sensitivity and specificity may drop when applied to pictures captured by a variety of health workers in a range of settings. The limited transferability of AI image analysis is a common issue in the field.

While these issues may temper optimism, the researchers think the system can be improved and used in the real world. In theory, it should be easier to train people to capture consistent images than perform the currently-used VIA approach, suggesting one barrier to adoption is surmountable. The next step is to adapt the system for use with images from current digital cameras, not cervicography.