recent Scientific report The study investigated the ability of ChATGPT to answer questions related to colorectal cancer (CRC).
Study: AI Evaluation in Medicine: A Comparative Analysis of Expert and ChatGPT Responses to Colorectal Cancer Questions. Image credit: Miha Creative/Shutterstock.com
Globally, CRC is one of the leading causes of death due to cancer. Despite advances in medicine, CRC patients have low survival rates.
To reduce mortality associated with CRC, early diagnosis and comprehensive treatment and care are essential.
One of the main factors hindering early detection of CRC is lack of awareness about CRC symptoms. Many patients ignore the serious symptoms of CRC due to the lack of accessible knowledge, which delays the opportunity to seek timely help from their doctors.
It should also be noted that much online information regarding CRC is misleading. Most peer-reviewed resources, such as UpToDate and the MSD Manual, are designed for healthcare professionals, not the general public.
It is important to create a trusted platform that patients can easily navigate for reliable information about the disease. These platforms must provide easy-to-understand medical information and provide the necessary guidance on when to seek treatment.
ChatGPT is a free AI system based on deep learning and large language models (LLM). Because ChatGPT responds to a wide range of prompts, it can be applied to many fields, including healthcare.
Common people can be made aware of various diseases by using this technology. Therefore, this AI-based platform can empower patients to make informed health decisions. Subsequently, this technology could lead to earlier diagnosis and better treatment outcomes.
About the study
Many studies have indicated the effectiveness of ChatGPT in medicine, but more are needed to evaluate its accuracy. To this effect, the present study investigated the efficiency of ChatGPT in relation to CRC diagnosis and treatment.
A book on “Colorectal Cancer: Answers to Your Questions” published in China was referred to evaluate the accuracy of ChatGPT, i.e. GPT-3.5 version, in answering questions about CRC.
A total of 131 questions were asked to ChatGPT covering various aspects of CRC such as surgical management, radiation therapy, internal medicine treatment, ostomy care, interventional treatment, pain control and deep vein care.
To test the accuracy of ChatGPT, the test questions were already answered by experts. ChatGPT’s answers to each question were evaluated and scored by clinical clinicians specializing in CRC.
The reproducibility of the ChatGPT results was tested, indicating a high level of uniformity of precision and comprehensiveness. The reproducibility in response to ChatGPT indicates consistent reliability of this system in providing accurate medical information.
Although ChatGPT has indicated a promising degree of accuracy, it has fallen short in terms of comprehensiveness. This shortcoming of ChatGPT can be attributed to AI model training with broad and non-specific data.
Therefore, updating AI models like ChatGPT with specific or specialized data will significantly improve the depth and breadth of model responses.
ChatGPT’s overall score indicates an exceptionally good performance of the model, especially for radiation therapy, stoma care, pain control and vein care.
Although ChatGPT performed well in providing valid answers to CRC-related questions, it lacked expert knowledge, especially in surgical management, basic information, and internal medicine. This low performance is expected to prevent the deployment of AI models such as ChatGPT in clinical practice.
The current study has limitations, including an insufficient number of valid questions associated with CRC. In addition, this study used the CRC’s Public Health Book as a data source, which essentially limited the type of CRC-based questions.
Because the questions were carefully selected by the authors, not all questions from real-life patients and their families may be included. Another limitation of this study is the use of book answers as a criterion for scoring ChatGPT’s answers to CRC-based questions.
The present study highlights the potential and limitations of ChatGPT with respect to CRC queries
The insights presented here may be the basis for future improved versions of ChatGPT that may help to accurately diagnose CRC and promote early treatment.
Future studies must use larger sample sizes to investigate the real-world efficacy of ChatGPT. Furthermore, researchers must explore ways to integrate personal health data and AI models to provide personalized information to patients.