Research
August 2025

Diabetes Prediction Algorithm Comparison

A comprehensive research study that compares various machine learning algorithms for predicting diabetes patient hospital readmission.

Diabetes Prediction Algorithm Comparison

About the Project

This project compares various machine learning algorithms for predicting hospital readmission within 30 days for diabetic patients, using the UCI Diabetes 130-US hospitals dataset (1999–2008). The workflow includes data collection, preprocessing, exploratory analysis, supervised learning (Logistic Regression, Decision Tree, Random Forest, KNN, Gradient Boosting, MLP, XGBoost, LightGBM), ensemble learning (Voting, Stacking), unsupervised learning (K-Means clustering), model evaluation, and visualization.

Key Features

  • Comparison of 8 supervised learning algorithms for readmission prediction
  • Ensemble learning models (Voting and Stacking)
  • K-Means clustering for patient profile analysis
  • SMOTE balancing for imbalanced dataset handling
  • Feature engineering (comorbidity score, prior utilization, hba1c_attention)
  • Comprehensive model evaluation with ROC curves and confusion matrices
  • Feature importance analysis and visualization
  • Patient cluster profiling with distinct readmission risks

Challenges & Solutions

  • Handling imbalanced dataset with low readmission rates
  • Feature engineering from complex medical data
  • Optimizing hyperparameters across multiple algorithms
  • Managing missing and invalid values in healthcare data
  • Balancing model complexity with interpretability for clinical use

Outcomes & Impact

  • Ensemble models (Voting, Stacking) outperformed individual models
  • Identified key predictors: prior utilization, comorbidity score, A1C result
  • Successfully clustered patients into distinct risk profiles
  • Comprehensive evaluation metrics across 8 different algorithms
  • Generated actionable insights for hospital readmission prevention

Technologies

PythonScikit-learnXGBoostLightGBMPandasMatplotlibImbalanced-learn (SMOTE)Jupyter Notebook

Tags

Machine LearningData MiningHealthcare AnalyticsResearch