Applied and Computational Engineering

- The Open Access Proceedings Series for Conferences


Proceedings of the 5th International Conference on Computing and Data Science

Series Vol. 17 , 23 October 2023


Open Access | Article

Applying machine learning models to breast cancer prediction problem

Ziqi Mai * 1
1 Xi’an University of Architecture and Technology

* Author to whom correspondence should be addressed.

Applied and Computational Engineering, Vol. 17, 126-138
Published 23 October 2023. © 2023 The Author(s). Published by EWA Publishing
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Citation Ziqi Mai. Applying machine learning models to breast cancer prediction problem. ACE (2023) Vol. 17: 126-138. DOI: 10.54254/2755-2721/17/20230925.

Abstract

Cancer has become the number one killer of human life and health. Therefore, a model that can predict cancer is able to help doctors to diagnose whether a patient has cancer or not, which can boost the accuracy of the diagnosis and enhance diagnostic efficiency, thus reducing the chance of misdiagnosis and other situations. This paper focuses on breast cancer prediction and adopted three machine learning based methods, including logistic regression, K-Nearest Neighbor, and decision tree models to build automatic solutions and investigate which model is more suitable for such a simple prediction problem. In this study, the detailed features, data collection and pre-processing approaches are presented to better understand such medical data. Then extensive experiments show that the accuracy scores of the three models are 97.08%, 94.89%, and 93.43%, respectively. Through comparison, it is concluded that the logistic regression model achieves the best performance for the breast cancer prediction task.

Keywords

classification problems, breast cancer predictions, logistic regression

References

1. Ferlay J, Ervik M, Lam F, Colombet M, Mery L, Piñeros M, Znaor A, Soerjomataram I and Bray F 2020 Global Cancer Observatory: Cancer Today International Agency for Research on Cancer.

2. Sultan H H, Salem N M and Al-Atabany W 2019 Multi-classification of brain tumor images using deep neural network IEEE access 7 p69215-25.

3. Domingos P 2012 A few useful things to know about machine learning Communications of the ACM 55.10 p78-87.

4. Kumari M and Singh V 2018 Breast cancer prediction system Procedia computer science 132 p371-376.

5. Nasteski V 2017 An overview of the supervised machine learning methods Horizons. b 4 p51-62.

6. Ziegel E R 2003 The elements of statistical learning Technometrics 45.3 p267-8.

7. Learned-Miller E G 2014 Introduction to supervised learning I: Department of Computer Science, University of Massachusetts 3.

8. Kaleem A, Ghori K M, Khanzada Z and Malik M N 2011 Address standardization using supervised machine learning interpretation 1.2 p10.

9. Ali P J M, Faraj R H, Koya E, Ali P J M and Faraj R H 2014 Data normalization and standardization: a technical report Mach Learn Tech Rep 1.1 p1-6.

10. DeMaris A and Selman S H 2013 Logistic regression Converting Data into Evidence: A Statistics Primer for the Medical Practitioner p115-36.

11. Khairunnahar L, Hasib M A, Rezanur R H B, Islam M R and Hosain M K 2019 Classification of malignant and benign tissue with logistic regression Informatics in Medicine Unlocked 16 p100189.

12. Peterson L E 2009 K-nearest neighbor Scholarpedia 4.2 p1883.

13. Song Y Y and Ying L U 2015 Decision tree methods: applications for classification and prediction Shanghai archives of psychiatry 27.2 p130.

14. Daniya T, Geetha M, and Kumar K S 2020 Classification and regression trees with Gini index Advances in Mathematics: Scientific Journal 9.10 p8237-47.

15. Liang J 2022 Confusion Matrix: Machine Learning POGIL Activity Clearinghouse 3.4.

Data Availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open Access Instruction).

Volume Title
Proceedings of the 5th International Conference on Computing and Data Science
ISBN (Print)
978-1-83558-025-7
ISBN (Online)
978-1-83558-026-4
Published Date
23 October 2023
Series
Applied and Computational Engineering
ISSN (Print)
2755-2721
ISSN (Online)
2755-273X
DOI
10.54254/2755-2721/17/20230925
Copyright
© 2023 The Author(s)
Open Access
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Copyright © 2023 EWA Publishing. Unless Otherwise Stated