← Back to portfolio

Machine Learning · Healthcare

Skin Cancer Diagnosis Classifier with Decision Trees

A supervised learning project using Decision Tree and Random Forest classifiers to distinguish between various skin disorders based on clinical features.

Overview

This project uses a dermatology dataset to build classification models for predicting a disorder type of the erythemato-squamous skin disease. The focus is on understanding the full ML pipeline: preprocessing and exploratory data analysis, model training, performance evaluation and analyzing feature importance.

This demonstrates how classification models can support decision-making within the healthcare field and help professionals diagnose certain diseases.

Data & Methods

Dataset

  • Dataset containing clinical features of various skin lesions.
  • Target label indicating 1 of 6 possible disorder diagnoses.

Preprocessing

  • Loaded CSV into pandas and performed exploratory data analysis, checked missing values, and distributions
  • Dropped missing values or features as needed. Age and Family History features were dropped to focus on physical traits.
  • Examined target feature correlation before training to get an idea of potential feature importance.

Modeling

  • Split data into train and test sets and created dummy variables.
  • Trained a default DecisionTreeClassifier using scikit-learn.
  • Tuned hyperparameters using a GridSearchCV to find optimal values and trained a DecisionTreeClassifier using these values.
  • For comparision, I tuned hyperparameters for a Random Forest model as well.
  • Evaluated all models using accuracy, precision, recall, and confusion matrix.

Tech Stack

Python, scikit-learn, pandas, NumPy, matplotlib, Jupyter

Metrics

Default parameters DecisionTree Classifier:


Tuned parameters DecisionTree Classifier:


Tuned parameters DecisionTree Classifier:

Key Charts

Confusion matrix for skin cancer classifier
Confusion matrix for the default Decision Tree Classifier
Feature importance bar plot
Confusion matrix for the tuned Decision Tree Classifier
Feature importance bar plot
Confusion matrix for the tuned Random Forest Classifier

Export these plots from the notebook and save them in assets/ with matching filenames.

Analysis

Challenges & Learnings

Project Links