Optimization of machine learning algorithms and data minimization for breast Cancer detection



Garriga Guitart, Joan                                           



Breast cancer is one of the most common cancers in women, with high mortality rates. Premature diagnosis and prognosis of breast cancer are key to reducing mortality. Machine learning, and the application of artificial intelligence, enable computers to identify patterns in large, noisy, or complex databases. Well suited for medical applications, these techniques are used in the diagnosis, classification, and prediction of cancer.
This project aims to analyze classification methods using machine learning techniques for the prediction of breast cancer. A 30-parameter database was used, containing records from 569 patients. The algorithms of Logistic Regression, K Nearest Neighbors, Random Forests, and Neural Networks were proposed. A possible reduction in the number of parameters for cancer prediction was also analyzed.
The algorithm K closest neighbors were the ones that showed the best overall performance, obtaining the highest precision, F1 value, and ROC-AUC value. Parameter reduction showed promising results. A reduction of more than 50% of the input data can be made with satisfactory results. This could have a major impact on the healthcare system, reducing the number of medical tests and therefore saving time, expense, and inconvenience to patients.

Keywords: machine learning, breast cancer, Python.








Fernández Esmerats, Joan                                              



IQS SE - Undergraduate Program in Biotechnology