Advanced Search
QI Ping, LI Jinhua, ZHAO Jinsheng, FU Caihong, ZHANG Longxia, QIAO Hui. Development of Machine Learning-Driven Diagnostic and Prognostic Models for Non-Small Cell Lung Cancer-Associated Malignant Pleural EffusionJ. Cancer Research on Prevention and Treatment, 2025, 52(12): 988-996. DOI: 10.3971/j.issn.1000-8578.2025.25.0599
Citation: QI Ping, LI Jinhua, ZHAO Jinsheng, FU Caihong, ZHANG Longxia, QIAO Hui. Development of Machine Learning-Driven Diagnostic and Prognostic Models for Non-Small Cell Lung Cancer-Associated Malignant Pleural EffusionJ. Cancer Research on Prevention and Treatment, 2025, 52(12): 988-996. DOI: 10.3971/j.issn.1000-8578.2025.25.0599

Development of Machine Learning-Driven Diagnostic and Prognostic Models for Non-Small Cell Lung Cancer-Associated Malignant Pleural Effusion

  • Objective To construct a diagnostic and prognostic model for malignant pleural effusion (MPE) in patients with non-M1b stage (AJCC 7th edition) non-small cell lung cancer (NSCLC) by machine learning.
    Methods Retrospective analysis was conducted on patients diagnosed with NSCLC in the Surveillance, Epidemiology, and End Results database from 2010 to 2015, excluding those in the M1b stage. Two sets of data were collected: data 1 (patients with non-M1b stage NSCLC, n=47 392) was used to construct the MPE diagnostic model; and data 2 (patients with M1a stage NSCLC and MPE, n=2 422) was used to construct a prognostic model. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was used to screen feature variables, with a training set and validation set ratio of 7:3. Models were built using eight machine learning algorithms, with evaluation metrics including accuracy, precision, recall, F1 score, area under the ROC curve (AUC), decision curve, calibration curve, and precision recall curve (PR), with ROC-AUC as the main evaluation metric.
    Results The incidence of MPE in patients with non-M1b stage NSCLC was 5.12%, and the 1-year survival rate of patients with MPE was 32.5%. LASSO regression identified nine diagnostic-related variables and 12 prognostic-related variables. The AUC values of the models constructed by eight machine learning algorithms all exceeded 0.70. The random forest model performed the best in the diagnostic model (training set AUC=0.908, validation set AUC=0.897), and the XGBoost model showed the best performance in the prognostic model (training set AUC=0.905, validation set AUC=0.875). Other evaluation indicators showed good results and balanced distribution. SHAP feature importance analysis showed that tumor size, lymph node metastasis, and histological type were important influencing factors for the occurrence of MPE, and chemotherapy intervention was the most remarkably prognostic factor.
    Conclusion The random forest diagnostic model constructed in this study can effectively predict the risk of MPE in patients with non-M1b stage NSCLC, and the XGBoost prognostic model can predict the prognosis of M1a-stage NSCLC patients with concurrent MPE.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return