Comparing Evolutionary-Inspired Algorithms and Neural Networks on Breast Cancer Prediction

Kevin Lopez Sepulveda

📋 Agenda

Introduction to the problem
Project Goal
Random Forest
MLP Neural Network
Methods
Results
Conclusion

🩺 Why Predicting Breast Cancer Matters

Early detection increases survival rates
Machine learning can support diagnostic tools
Automation helps reduce human error

🌲 How Random Forest Works

An ensemble of decision trees
Trains multiple trees on bootstrapped samples
Final prediction is based on majority voting
Helps reduce overfitting

🧠 How MLP Neural Network Works

MLP = Multi-layer Perceptron
Consists of input layer, hidden layers, output layer
Learns via backpropagation and gradient descent
Effective at capturing complex patterns

🔧 Methods

Dataset: Breast Cancer Wisconsin Diagnostic Data
Preprocessing: normalization, handling missing values
Algorithms: Random Forest, MLP
Evaluation metrics: Accuracy, Precision, Recall, F1 Score

🐍 Running Models and Showing Results

-Python Code: Below is the code to run both models and compare their performance on the breast cancer dataset.

Training Random Forest Classifier (baseline for evolutionary approach)...

Training Neural Network Classifier...

--- Random Forest Classification Report ---
              precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114


--- Neural Network Classification Report ---
              precision    recall  f1-score   support

           0       0.98      0.98      0.98        43
           1       0.99      0.99      0.99        71

    accuracy                           0.98       114
   macro avg       0.98      0.98      0.98       114
weighted avg       0.98      0.98      0.98       114


--- Accuracy Summary ---
Random Forest Accuracy: 0.9649
MLP Accuracy:          0.9825

📊 Results

Precision : How many predicted positive cases are actually positive
Recall : How many actual positive cases are correctly identified
F1-Score : Balance of precision and recall
Accuracy : Overall percentage of correct predictions

✅ Conclusion

Random Forest performed better in terms of interpretability
MLP captured non-linear relationships better
Combining both might improve overall performance
Future work: include other algorithms, tune hyperparameters

🙏 Thank You

Kevin Lopez Sepulveda
klopezs@bu.edu