Mental Health Prediction Using CatBoost
- Kyuta Yasuda
- Jan 28
- 2 min read
Project Description
This project focuses on predicting mental health outcomes (depression) based on various demographic, behavioral, and lifestyle factors. Using the CatBoost algorithm, the model classifies individuals into depression or no-depression categories by analyzing a comprehensive dataset that includes features like age, gender, work pressure, sleep duration, and dietary habits.
Key Highlights
Objective: Predict depression in individuals based on a dataset containing both categorical and numerical features.
Dataset: Mental health survey data containing demographic, professional, and health-related attributes.
Model: CatBoostClassifier, a gradient boosting algorithm optimized for categorical data.
Tools and Libraries:
Python
Libraries: pandas, sklearn, catboost
Jupyter Notebook for implementation
Workflow
Data Preprocessing:
Identified categorical features (e.g., gender, city, dietary habits) and numerical features (e.g., age, work pressure, financial stress).
Handled missing values by:
Filling missing categorical values with "missing."
Filling missing numerical values with the median.
Split the data into training and validation sets for model training.
Model Implementation:
Utilized the CatBoostClassifier for handling categorical features directly.
Tuned hyperparameters, including iterations, learning rate, and depth, for optimal performance.
Training:
Trained the model on the preprocessed training dataset.
Validated the model on the validation set and monitored training and validation accuracy.
Evaluation:
Evaluated the model using metrics like accuracy.
Achieved validation accuracy: 87%
Prediction and Deployment:
Predicted mental health outcomes on a test set.
Saved predictions in a CSV file for submission and analysis.
Results
Model Performance:
Validation Accuracy: 87%
Learning Outcomes
Learned to preprocess complex datasets with a mix of categorical and numerical data.
Successfully implemented CatBoost, leveraging its ability to handle categorical data without additional encoding.
Improved model evaluation and optimization skills using metrics and validation techniques.
Gained experience in exporting results for further analysis or deployment.
Next Steps
Perform hyperparameter tuning using techniques like grid search or Bayesian optimization.
Explore other boosting algorithms (e.g., LightGBM or XGBoost) for comparison.
Incorporate additional features or external data to improve prediction accuracy.
Use feature importance metrics to analyze key factors contributing to mental health outcomes.
コメント