In a recently published study, Dr Scientific report, Researchers evaluate and discuss limitations of shared chest x-rays (CXR) via smartphone applications.
Against the backdrop of the coronavirus disease 2019 (COVID-19) pandemic, researchers highlight the advantages of automated clinical diagnostic tools developed using artificial intelligence (AI) models and also explain their shortcomings, especially when analyzing highly compressed images. Multi-task learning (MTL) was also introduced as a method to overcome the current challenges associated with AI models.
Study: Challenges of AI-driven diagnosis of chest x-rays sent via smart phone: a case study on COVID-19. Image Credit: ShutterOk / Shutterstock.com
AI in diagnosing COVID-19
Before the development of clinical diagnostic COVID-19 test kits, CXR was the first-line triage assessment of the disease. However, due to the unprecedented spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the limited number of radiologists available worldwide soon became overwhelmed, especially in low- to middle-income countries (LMICs) and rural areas.
To address this burden on radiologists, centralized AI-based systems were conceived to automate the diagnosis of COVID-19 from CXR images. Advances in smartphone hardware capabilities and increasing smartphone penetration, even in LMICs, have made smartphones the ideal medium for implementing these AI models. Recent smartphones have high-resolution and color-sensitive cameras, which studies have shown to be sufficient for accurate COVID-19 diagnosis by trained radiologists.
Another advantage of smartphones is the inclusion of media-enabled messaging applications including WhatsApp and Telegram. These messaging platforms allow images to be shared remotely, thus facilitating diagnosis in addition to local radiologists. Keeping these features of smartphones in mind, several COVID-19 diagnostic AI systems were launched, called ‘AI-aided diagnosis of X-ray images through messaging application’ (AIDXA).
Although AIDXA systems were designed for low-bandwidth availability in rural areas, with some systems such as Indian XraySetu able to interface directly with WhatsApp, a limitation of these applications is data loss due to image compression. Although there is no noticeable effect on the diagnosis of expert radiologists, limited evidence indicates that image compression can significantly alter AI diagnostic performance.
About the study
In the current study, the researchers first present a case study to define and illustrate two major limitations of the current AIDXA system for diagnosing Covid-19. They then developed an in-house COVID-19 image database to quantitatively evaluate the effects of image compression on the performance of the AIDXA model. Finally, they describe, design and train a novel multi-task learning model aimed at accurate COVID-19 diagnosis, even under image compression conditions.
Despite the benefits of the AIDXA system in automatically diagnosing COVID-19, which has partially addressed the global shortage of expert human radiologists, the current study identified ‘prediction instability’ (PIP) and ‘lung saliency’ (OLS) as serious limitations. AI system.
To evaluate the current model performance, a novel CXR image dataset named ‘WhatsApp CXR’ (WaCXR) has been developed. The dataset consists of 6,562 JPEG CXR images from the Covid-Net database, passed through WhatsApp compression, resulting in 6,562 pairs of visually-identical compressed and uncompressed images.
Prediction instability is the lack of convergence between model predictions between compressed and uncompressed CXR images. While a model may identify a patient as COVID-19-positive based on uncompressed CXR images, the same model may classify a patient as COVID-19-negative when the same CXR image is subjected to WhatsApp compression. This lack of similarity represents a potentially fatal flaw in clinical applications, providing unreliable predictions.
Machine learning research suggests that the high predictive performance of deep learning models can be partially attributed to the unintended learning of shortcut techniques. While effective in some AI applications, it presents a significant challenge in the medical field, where interpretable and reproducible predictions are essential.
The current study used saliency maps, which are algorithms that identify regions of an image that contribute to model predictions, to evaluate the pathology predictions of the current AIDXA model. The saliency map results suggest that the COVID-19 prediction of several state-of-the-art AIDXA models is based on CXR image regions outside the lungs. This OLS is observed in both uncompressed and compressed images, with the magnitude of OLS increasing in the latter.
Although previous studies have identified PIP and OLS as challenges, no metrics have been applied to investigate their effects. To address this need, researchers introduce ‘PI score’ and ‘OLS score’ as quantitative measures of the state-of-the-art AIDXA model performance.
Given the alarmingly high volatility and fluency in current AIDXA models, a novel multi-task learning (MTL) model called COVIDMT was developed.
COVIDMT is built on top of a state-of-the-art deep learning network known as the base network. “The base network is initialized with ImageNet weights to enable transfer learning, thereby maximizing the performance of the COVIDMT model in the target domain.”
PI and OLS scores were used to evaluate the performance of current generation AI COVID-19 diagnostic models versus COVIDMT.
Study results
Currently, the most widely used deep neural network AIDXA systems for automated COVID-19 diagnosis are ResNet-50, ResNeXt-50, VGG-19, XceptionNet and COVID-Net. Each of these models was evaluated for PIP and OLS performance; However, COVID-Net is particularly relevant, as it uses the same training dataset as COVIDMT.
WaCXR dataset preparation reduced the file size from 6.7 GB to 351 MB with a 95% compression factor. Although visually almost indistinguishable, this results in significant pixel-level variation and, consequently, AI model input data inconsistencies.
PI score results indicate stability between 4.36% and 11.71% for current state-of-the-art models. OLS scores were similarly poor, with average heights of 66% for original images and 70% for compressed images. Notably, Covid-Net presented a saliency of 70%, even for compressed images, thus highlighting that current AI models are vulnerable to both saliency and saliency.
The COVIDMT results showed a 40% improvement in the PI score of the MT model compared to ResNet-50 and ResNeXt-50. The OLS score is similarly improved by 35% compared to the corresponding base model.
In future studies, it would be interesting to explore the challenges of PIP and OLS in relation to different abnormalities and imaging modalities. Additionally, investigating the potential of a multi-task learning framework to address these issues may be a promising direction for further exploration.”
Journal Reference:
- Antony, M., Kakileti, S. T., Shah, R., etc (2023). Challenges of AI-driven diagnosis of chest x-rays sent via smart phone: a case study on COVID-19. Scientific report 13(1); 1-16. doi:10.1038/s41598-023-44653-y