Titanic Survival Prediction: Kaggle
- Kyuta Yasuda
- Jan 28
- 2 min read
Project Description
This project focuses on predicting the survival of passengers aboard the Titanic using a machine learning approach. By leveraging a Decision Tree Classifier, the model was trained and tested to classify whether a passenger survived based on features like age, class, gender, and more.
Key Highlights
Dataset: The Titanic dataset was utilized, containing demographic, ticket, and survival details for passengers.
Objective: Predict passenger survival with a focus on model performance and accuracy.
Tools and Libraries:
Python
Libraries: pandas, scikit-learn, and Matplotlib for data analysis, model training, and evaluation.
Jupyter Notebook for implementation.
Project Workflow
Data Preprocessing:
Handled missing values for critical features such as Age and Cabin.
Converted categorical variables (e.g., Sex, Embarked) into numerical representations for machine learning compatibility.
Feature Engineering:
Selected essential features for prediction, such as Pclass, Sex, Age, and Fare.
Model Implementation:
Used DecisionTreeClassifier with specific hyperparameters:
Splitter: Random
Minimum Samples Per Leaf: 4
Random State: 12
Trained the model on a subset of data and made predictions on the test set.
Evaluation:
Predicted survival outcomes were saved as a CSV file for submission.
Model performance was measured using classification metrics such as accuracy and confusion matrix (not explicitly shown in the notebook).
Data Exploration
Result
The model achieved 0.87 accuracy on the training dataset, with a F1 score of 0.84.
Learning Outcomes
Preprocessed and explored real-world data, handling missing values and categorical features.
Implemented a Decision Tree Classifier, tuning hyperparameters for optimal performance.
Exported predictions for validation, adhering to a Kaggle-style competition format.
Next Steps
Experiment with advanced models like Random Forests or Gradient Boosting for improved performance.
Perform hyperparameter tuning to optimize the Decision Tree Classifier.
Explore feature importance to understand which variables contribute most to survival predictions.
Comments