0

Anton Kuznetsov

Moscow Aviation Institute, Russian Federation

Title: Predicting of Cervical Cancer metastases in the regional lymph nodes using Machine Learning and hematologic features

Abstract

Pre-treatment detection of metastases in regional lymph nodes comprises the main source of errors in determining the exact stage of cervical cancer. The diagnostic accuracy of metastases by MRI and CT is 75%. The minimum size of detected lymph nodes using MRI is 3mm, CT - 10mm. Objective of the study to improve the accuracy of detecting metastases by using machine learning methods applied to hematology data. Materials and methods. The study involved 495 patients of the P.A. Herzen’s Moscow Research Institute of Oncology. The patients were from different regions of Russia, their diagnosis was cervical cancer, the age from 18 to 82 years (median age 42 [35; 51] years). All patients were divided into two groups depending on whether they had metastases in regional lymph nodes or not (confirmed by biopsy). The blood data from patients has 21 features. The next 7 indicators were defined as primarily important: ESR (erythrocyte sedimentation rate), erythrocytes, hemoglobin, fibrinogen, D-dimer, platelet aggregation with adenosine diphosphate and SFMC (soluble fibrin monomer complex). For analysis the following machine learning algorithms were investigated: Decision Tree (DT); Random Forest (RF); K-nearest Neighbors (KNN); Support Vector Machine (SVM); Adaboost and XGBoost. The models were built by Python v.3.9. Results. The accuracy of the built models in the test set (30%) showed the following results: SVM (98%), Adaboost (98%), XGBoost (98%), KNN (97%), RF (96%), DT (92%). The explanatory variance (R2) for SVM (83%), however, surpasses the other models: XGBoost (81%), KNN (81%), RF (80%), Adaboost (66%), DT (45%). Conclusions. The accuracy of diagnosing metastases in regional lymph nodes can be improved relative to CT and MRI using a model built based on SVM machine learning algorithm. This model showed for 7 hematologic features an accuracy of 98% on test data, with explanatory variance R2 of 83%.

Biography

Birth 1997, Moscow. 2016-2020 – Moscow Aviation Institute, Faculty of Control systems, informatics and electric power industry. He has 13 publications and 1 patent.