top of page
Search

Titanic Survival Prediction: Kaggle

  • Writer: Kyuta Yasuda
    Kyuta Yasuda
  • Jan 28
  • 2 min read

Project Description

This project focuses on predicting the survival of passengers aboard the Titanic using a machine learning approach. By leveraging a Decision Tree Classifier, the model was trained and tested to classify whether a passenger survived based on features like age, class, gender, and more.


Key Highlights

  1. Dataset: The Titanic dataset was utilized, containing demographic, ticket, and survival details for passengers.

  2. Objective: Predict passenger survival with a focus on model performance and accuracy.

  3. Tools and Libraries:

    • Python

    • Libraries: pandas, scikit-learn, and Matplotlib for data analysis, model training, and evaluation.

    • Jupyter Notebook for implementation.


Project Workflow

  1. Data Preprocessing:

    • Handled missing values for critical features such as Age and Cabin.

    • Converted categorical variables (e.g., Sex, Embarked) into numerical representations for machine learning compatibility.

  2. Feature Engineering:

    • Selected essential features for prediction, such as Pclass, Sex, Age, and Fare.

  3. Model Implementation:

    • Used DecisionTreeClassifier with specific hyperparameters:

      • Splitter: Random

      • Minimum Samples Per Leaf: 4

      • Random State: 12

    • Trained the model on a subset of data and made predictions on the test set.

  4. Evaluation:

    • Predicted survival outcomes were saved as a CSV file for submission.

    • Model performance was measured using classification metrics such as accuracy and confusion matrix (not explicitly shown in the notebook).


Data Exploration



Result

The model achieved 0.87 accuracy on the training dataset, with a F1 score of 0.84.


Learning Outcomes

  1. Preprocessed and explored real-world data, handling missing values and categorical features.

  2. Implemented a Decision Tree Classifier, tuning hyperparameters for optimal performance.

  3. Exported predictions for validation, adhering to a Kaggle-style competition format.


Next Steps

  • Experiment with advanced models like Random Forests or Gradient Boosting for improved performance.

  • Perform hyperparameter tuning to optimize the Decision Tree Classifier.

  • Explore feature importance to understand which variables contribute most to survival predictions.



 
 
 

Comments


bottom of page