In a recent study published in the journal Dr Nature’s medicineResearchers tested the ability of specialty and general practitioners to diagnose skin disorders across skin tones in a simulated teledermatology scenario.
Deep learning-based approaches to image-based diagnosis can improve clinical decisions, but their efficacy is unknown due to methodological errors, especially when evaluating underrepresented groups. The future of machine learning in medicine may feature physician-machine collaboration, with domain-specific interfaces based on machine learning models aiding clinical knowledge in generating more accurate diagnoses. Expert recognition is important for automated overriding recommendations. Early research on store-and-forward teledermatology reveals that deep learning systems can increase the accuracy of general diagnoses, but there are still uncertainties about physician expertise and performance across underrepresented groups.
Study: Elevated body temperature is associated with depressive symptoms: results from the TemperaturePredict Study. Image credit: RossHelen/Shutterstock
About the study
In the current study, researchers conducted a digital analysis with 389 board-certified dermatologists (BCDs) and 459 primary-care physicians (PCPs) from 39 countries to assess the diagnostic accuracy provided by general and specialist physicians in a teledermatology simulation.
The study involved 364 images of 46 dermatological diseases and asked participants to submit a maximum of four differential diagnoses. Most of the images represent eight relatively common skin conditions. The team recruited several physician participants and designed the study to draw valuable insights from gamification techniques such as feedback, rewards, contests, and individualized rules. They explored a replicable design space including different skin tones, skin diseases, physician knowledge, physician-machine collaboration, clinical decision support accuracy, and user interface design.
The researchers measured diagnostic accuracy without the aid of artificial intelligence and across dark and light skin tones and followed an algorithmic auditing technique. The team focused on skin diseases based on three criteria: (i) three practicing board-certified dermatologists identified these diseases as the most likely diseases on which the team could find accurate discrimination between patients’ skin tones; (ii) these diseases are relatively common; and (iii) these diseases appeared frequently enough in dermatology textbooks and dermatological image atlases that the team could select at least five images of two dark skin types after applying for a quality-control review by board-certified dermatologists.
To offer computer vision-based predictions of diagnosis, the team trained a convolutional neural network to classify nine labels: eight skin diseases of interest and one other category. The researchers refined the model on 31,219 diverse clinical dermatology images from the Fitzpatrick 17k dataset and additional images obtained from textbooks, dermatology atlases, and online search engines. The team compared the DLS system to the clinician’s performance in diagnosing skin diseases using fine tuning of the VGG-16 architecture on 31,219 clinical dermatology images.
General practitioners and specialists achieved diagnostic accuracy of 19% and 38%, respectively, and showed a four percentage point lower accuracy for diagnosis in dark-skinned compared to light-skinned. Deep learning-based decision support increased clinicians’ diagnostic accuracy by >33% but widened gaps in general clinicians’ diagnostic accuracy across different skin tones.
General practitioners, primary care physicians, dermatologists, and board-certified dermatologists had peak accuracies of 18%, 19%, 36%, and 38%, respectively, across images (excluding focus test images) and 16%, 17%, 35%, and 37%, respectively, for photographs indicating the eight primary dermatoses investigated. The top clinical diagnosis identified for images by PCPs and BCDs was correct in 33% and 48% of observations, respectively.
In 77.0% of photos, one or more BCD identified the reference label in the differential diagnosis, whereas one or more PCP did so in 58%. After observing a correct DLS estimate, one or more BCDs included the reference label in the differential diagnosis in 98.0% of the photos. Across all photos, participants detected disorder in darker skin (estimated FST 5.0 and 6.0) with lower accuracy than in lighter skin.
When the physician categories were examined independently, the top accuracy of board-certified dermatologists, dermatologists, primary care physicians, and other doctors was five percentage points, five percentage points, three percentage points, and five percentage points lower than black skin photos. Light skin, respectively. Similarly, the accuracy of top diagnoses by board-certified dermatologists, dermatologists, primary care physicians, and other physicians decreased by three percentage points, five percentage points, four percentage points, and four percentage points for dark skin vs. skin photos. Light skin, respectively. Black-skinned patients were 4.4 percent more likely to be referred to a dermatologist for a second opinion with BCD.
Research results have shown that deep learning-based decision support can increase the accuracy of human diagnosis in teledermatology situations. BCDs had a top-3 diagnosis accuracy of 38%, while PCPs had 19%. The results are consistent with previous studies indicating that experts outperformed generalists in diagnosing skin diseases, but accuracy was lower than previous studies. Expert and lay experts had poorer diagnostic accuracy on dark skin images than on fair skin. BCDs and PCPs performed four percentage points better on light-skin photos than on dark skin. DLS-based decision support improves top-1 diagnosis accuracy by 33% for BCD and 69% for PCP, resulting in greater sensitivity when detecting specific skin diseases.