数据挖掘技术在肺癌危险度预测模型中的应用

高孜博; 李迪; 段书音; 周晓蕾; 刘红; 王静; 王威; 吴拥军

doi:10.3971/j.issn.1000-8578.2021.20.0829

数据挖掘技术在肺癌危险度预测模型中的应用

Application of Data Mining Technology in Risk Prediction Model for Lung Cancer

摘要

摘要:
目的使用数据挖掘技术建立肺癌危险度预测模型，比较C5.0决策树与人工神经网络用于肺癌风险预测的性能，并探讨其在肺癌风险预测中的价值。
方法选择180例肺癌患者及240例肺良性疾病患者，收集肺癌相关危险因素和临床症状共17个自变量，建立C5.0决策树与人工神经网络模型，比较模型的预测性能。
结果共收集420份病历资料，将所有样本按7:3随机分为训练集样本和测试集样本。人工神经网络模型的测试集准确度为65.3%、敏感度为61.7%、特异性为73.3%、约登指数为0.350、阳性预测值为54.9%、阴性预测值为73.1%、AUC为0.675（95%CI: 0.628~0.720）。C5.0决策树模型的测试集准确度为61.0%、敏感度为47.8%、特异性80.4%、约登指数为0.282、阳性预测值为35.3%、阴性预测值为80.6%、AUC为0.641（95%CI: 0.593~0.687）。
结论人工神经网络模型整体性能优于C5.0决策树，在肺癌危险度的预测中具有潜在的应用价值。

Abstract:
Objective To establish a lung cancer risk prediction model using data mining technology and compare the performance of decision tree C5.0 and artificial neural networks in the application of risk prediction model, and to explore the value of data mining techniques in lung cancer risk prediction.
Methods We collected the data of 180 patients with lung cancer and 240 patients with benign lung lesion which contained 17 variables of risk factors and clinical symptoms. Decision tree C5.0 and artificial neural networks models were established to compare the prediction performance.
Results There were 420 valid samples collected in total and proportioned with the ratio of 7:3 for the training set and testing set. The accuracy, sensitivity, specificity, Youden index, positive predictive value, negative predictive value and AUC of artificial neural networks model were 65.3%, 61.7%, 73.3%, 0.350, 54.9%, 73.1% and 0.675 (95%CI: 0.628-0.720) in testing set; those of decision tree C5.0 model were 61.0%, 47.8%, 80.4%, 0.282, 35.3%, 80.6% and 0.641 (95%CI: 0.593-0.687) in testing set.
Conclusion The artificial neural networks model is superior to the decision tree C5.0 model at overall performance and it has potential application value in the risk prediction of lung cancer.

HTML全文

参考文献(17)

施引文献

资源附件(0)