Skip Navigation
Skip to contents

Cancer Res Treat : Cancer Research and Treatment

OPEN ACCESS

Articles

Page Path
HOME > J Korean Cancer Assoc > Accepted articles > Article
Original Article
A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations
Bomi Park1orcid , Chung Ho Kim1, Jae Kwan Jun2, Mina Suh2, Kui Son Choi2, Il Ju Choi3, Hyun Jin Oh4orcid

DOI: https://doi.org/10.4143/crt.2024.843 [Accepted]
Published online: December 16, 2024
1Department of Preventive Medicine, College of Medicine, Chung-Ang University, Seoul, Korea
2National Cancer Control Institute, National Cancer Center, Goyang, Korea
3Division of Gastroenterology, Department of Internal Medicine, Center for Gastric Cancer, National Cancer Center, Goyang, Korea
4Division of Gastroenterology, Department of Internal Medicine, Center for Cancer Prevention and Detection, National Cancer Center, Goyang, Korea
Corresponding author:  Hyun Jin Oh
Tel: 82-31-290-1759 Email: hyun.jin.8411@gmail.com
Received: 29 August 2024   • Accepted: 15 December 2024
  • 306 Views
  • 32 Download
  • 0 Crossref
  • 0 Scopus

Purpose
Gastric cancer (GC) prediction models hold potential for enhancing early detection by enabling the identification of high-risk individuals, facilitating personalized risk-based screening, and optimizing the allocation of healthcare resources.
Materials and Methods
In this study, we developed a machine learning-based GC prediction model utilizing data from the Korean National Health Insurance Service, encompassing 10,515,949 adults who had not been diagnosed with GC and underwent GC screening during 2013–2014, with a follow-up period of at least five years. The cohort was divided into training and test datasets at an 8:2 ratio, and class imbalance was mitigated through random oversampling.
Results
Among various models, logistic regression demonstrated the highest predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.708, which was consistent with the AUC obtained in external validation (0.669). Importantly, the outcomes were robust to missing data imputation and variable selection. The SHapley Additive exPlanations (SHAP) algorithm enhanced the explainability of the model, identifying advancing age, being male, Helicobacter pylori infection, current smoking, and a family history of GC as key predictors of elevated risk.
Conclusion
This predictive model could significantly contribute to the early identification of individuals at elevated risk for gastric cancer, thereby enabling the implementation of targeted preventive strategies. Furthermore, the integration of noninvasive and cost-effective predictors enhances the clinical utility of the model, supporting its potential application in routine healthcare settings.

  • Cite
    CITE
    export Copy Download
    Close
    Download Citation
    Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

    Format:
    • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
    • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
    Include:
    • Citation for the content below
    A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations
    Close

Cancer Res Treat : Cancer Research and Treatment
Close layer
TOP