On April 15, 1912, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This tragedy has led to better safety regulations for ships.

To predict which passengers survived in this tragedy based on the data given

1. basic cleaning for missing values in train and test data set
2. 5 fold crossvalidation
3. Model used is Support Vector Machines
4. Predict for test data set

### Importing libraries

Let’s import the library

``````import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn import cross_validation

import matplotlib.pyplot as plt
%matplotlib inline
``````

### Reading training and testing data set

``````train=pd.read_csv('C:\\Users\\Arpan\\Desktop\\titanic data set\\train.csv')
``````

### Data Cleaning

Let’s create a function for cleaning the training and testing data .Here we are doing two things.
1. Encoding the categorical variables manually
2. Imputing the missing values.

``````def data_cleaning(train):
train["Age"] = train["Age"].fillna(train["Age"].median())
train["Fare"] = train["Age"].fillna(train["Fare"].median())
train["Embarked"] = train["Embarked"].fillna("S")

train.loc[train["Sex"] == "male", "Sex"] = 0
train.loc[train["Sex"] == "female", "Sex"] = 1

train.loc[train["Embarked"] == "S", "Embarked"] = 0
train.loc[train["Embarked"] == "C", "Embarked"] = 1
train.loc[train["Embarked"] == "Q", "Embarked"] = 2

return train
``````

Let’s clean the data

``````train=data_cleaning(train)
test=data_cleaning(test)
``````

### Selecting Predictor Variables

Let’s choose the predictor variables.We will not choose the cabin and Passenger id variable

``````predictor_Vars = [ "Sex", "Age", "SibSp", "Parch", "Fare"]
``````

### X & y

Let’s separate predictors and target.X is array of predictor variables and y is target variable.We will use these while model fitting.

``````X, y = train[predictor_Vars], train.Survived

``````

Let’s check X

``````X.iloc[:5]
``````
SexAgeSibSpParchFare
00221022
11381038
21260026
31351035
40350035

Let’s check y

``````y.iloc[:5]
``````
``````0    0
1    1
2    1
3    1
4    0
Name: Survived, dtype: int64
``````

### Model Initialization & Fitting

Let’s choose Support Vector Classifier model parameters and fit the model.

``````modelSVM = SVC(kernel='linear', C=0.8,gamma=0.01).fit(X,y)

``````

### Cross-validation

Let’s do the 5 fold crossvalidation

``````modelSVMCV= cross_validation.cross_val_score(modelSVM,X,y,cv=5)

``````

Let’s check the accuracy metric of each of the five folds

``````modelSVMCV
``````
``````array([ 0.80446927,  0.80446927,  0.78651685,  0.75280899,  0.78531073])
``````

Let’s see the same information on the plot

``````plt.plot(modelSVMCV,"p")
``````
``````[<matplotlib.lines.Line2D at 0xad06320>]
``````

Let’s check the mean model accuracy of all five folds

``````print(modelSVMCV.mean())

``````
``````0.786715024929
``````

### Model fitting

If we are satisfied with the cross-validation, then let’s now fit the model with the same parameters on the whole data set instead of 4/5th part of data set as we did in crossvalidation.

``````modelSVM = SVC(kernel='linear', C=0.8,gamma=0.01).fit(X,y)
modelSVM.fit(X,y)
``````
``````SVC(C=0.8, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma=0.01, kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
``````

### Predictions on test data set

``````predictions=modelSVM.predict(test[predictor_Vars])
predictions

``` array([0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,
```       0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,1, 0,
0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0,
1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,
1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1,
0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1,
1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
1, 0, 0, 0], dtype=int64)
``````