top of page
Search

Mental Health Prediction Using CatBoost

  • Writer: Kyuta Yasuda
    Kyuta Yasuda
  • Jan 28
  • 2 min read

Project Description

This project focuses on predicting mental health outcomes (depression) based on various demographic, behavioral, and lifestyle factors. Using the CatBoost algorithm, the model classifies individuals into depression or no-depression categories by analyzing a comprehensive dataset that includes features like age, gender, work pressure, sleep duration, and dietary habits.


Key Highlights

  1. Objective: Predict depression in individuals based on a dataset containing both categorical and numerical features.

  2. Dataset: Mental health survey data containing demographic, professional, and health-related attributes.

  3. Model: CatBoostClassifier, a gradient boosting algorithm optimized for categorical data.

  4. Tools and Libraries:

    • Python

    • Libraries: pandas, sklearn, catboost

    • Jupyter Notebook for implementation


Workflow

  1. Data Preprocessing:

    • Identified categorical features (e.g., gender, city, dietary habits) and numerical features (e.g., age, work pressure, financial stress).

    • Handled missing values by:

      • Filling missing categorical values with "missing."

      • Filling missing numerical values with the median.

    • Split the data into training and validation sets for model training.

  2. Model Implementation:

    • Utilized the CatBoostClassifier for handling categorical features directly.

    • Tuned hyperparameters, including iterations, learning rate, and depth, for optimal performance.

  3. Training:

    • Trained the model on the preprocessed training dataset.

    • Validated the model on the validation set and monitored training and validation accuracy.

  4. Evaluation:

    • Evaluated the model using metrics like accuracy.

    • Achieved validation accuracy: 87%

  5. Prediction and Deployment:

    • Predicted mental health outcomes on a test set.

    • Saved predictions in a CSV file for submission and analysis.


Results

  • Model Performance:

    • Validation Accuracy: 87%


Learning Outcomes

  1. Learned to preprocess complex datasets with a mix of categorical and numerical data.

  2. Successfully implemented CatBoost, leveraging its ability to handle categorical data without additional encoding.

  3. Improved model evaluation and optimization skills using metrics and validation techniques.

  4. Gained experience in exporting results for further analysis or deployment.

Next Steps

  • Perform hyperparameter tuning using techniques like grid search or Bayesian optimization.

  • Explore other boosting algorithms (e.g., LightGBM or XGBoost) for comparison.

  • Incorporate additional features or external data to improve prediction accuracy.

  • Use feature importance metrics to analyze key factors contributing to mental health outcomes.

 
 
 

コメント


bottom of page