deeplearning
deeplearning

Introduction

Precise Demo on Neural Networks for Machine Learning and Data Analytics in R.
Open your RStudio and follow along !!

Importing libraries

#### importing the library MASS for "Boston" dataset
library(MASS)
library(neuralnet)
## Loading required package: grid

Reading data

####  Setting the seed so that we get same results each time
####  we run the  neural nets again
set.seed(123)

#### Storing the data set named "Boston" into DataFrame
DataFrame <- Boston

#### To get the Help on Boston data set uncomment the below line
####  help("Boston")

####  For looking at Structure of Boston data
str(DataFrame)
## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ black  : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

Data Exploration

#### Look at the histogram of the target or outcome variable named medv
hist(DataFrame$medv,col=colors()[100:110],breaks=10,
     main = "Histogram for medv",
     xlab = "medv")
histogram
Histogram of medv
####  Check the dimension of this data frame
dim(DataFrame)
## [1] 506  14
####  Check first 3 rows
head(DataFrame,3)
##      crim zn indus chas   nox    rm  age    dis rad tax ptratio  black
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83
##   lstat medv
## 1  4.98 24.0
## 2  9.14 21.6
## 3  4.03 34.7
####  Check the summary of each variable
####  This will give min and max value for each of the variable
apply(DataFrame,2,range)
##          crim  zn indus chas   nox    rm   age     dis rad tax ptratio
## [1,]  0.00632   0  0.46    0 0.385 3.561   2.9  1.1296   1 187    12.6
## [2,] 88.97620 100 27.74    1 0.871 8.780 100.0 12.1265  24 711    22.0
##       black lstat medv
## [1,]   0.32  1.73    5
## [2,] 396.90 37.97   50

Scaling or Data Transformation

####  Seems like scale of each variable is not same

### We need to Normalize the data in interval [0,1]
### Normalization  is necessary so that each variable is scaled properly
### and none of the variables over dominates in the model
### scale function will give min-max scaling here

maxValue <- apply(DataFrame, 2, max) 
minValue <- apply(DataFrame, 2, min)
DataFrame<-as.data.frame(scale(DataFrame,center = minValue,
                               scale =maxValue-minValue))

Data Partition,Modelling & Predictions

#### Lets partition the dataset into train and test data set 
ind<-sample(1:nrow(DataFrame),400) 
trainDF<-DataFrame[ind,] testDF<-DataFrame[-ind,]

#### Lets take some configuration for neural network ### say 13-4-2-1 
#### So number of hidden layes=2 
### input layer has 13 units=number of predictor variables 
#### No. of units in first hidden layer =4
#### No. of units in second hiddenlayer =2
#### No. of units in output layer=1 as there is only target variable which we want 
#### to predict.Here target variable is "medv" 

#### We need this formula like below in order to use neuralnet function
#### medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad + 
#### tax + ptratio + black + lstat 

#### Below is the code for doing the same without writing the names individually 
allVars<-colnames(DataFrame) 
predictorVars<-allVars[!allVars%in%"medv"]
predictorVars<-paste(predictorVars,collapse = "+")
form=as.formula(paste("medv~",predictorVars,collapse = "+")) 

#### Let's fit the model now
neuralModel<-neuralnet(formula =form,hidden = c(4,2),linear.output = T,
                       data =trainDF)

#### Let's Predict for test data set 
predictions <- compute(neuralModel,testDF[,1:13]) 

#### Let's check the structure of predictions.It is a dataframe. 
str(predictions) 

Unscaling & MSE calculation

#### Let's unscale the predictions and actual values in order to evaluate the mean 
#### squared error(MSE) 
predictions <- predictions$net.result*(max(testDF$medv)-min(testDF$medv))+
                                        min(testDF$medv)
actualValues <- (testDF$medv)*(max(testDF$medv)-min(testDF$medv))+min(testDF$medv)

#### Let's calculate the MSE
MSE <- sum((predictions - actualValues)^2)/nrow(testDF)
MSE
## [1] 0.009414517716

Real vs predicted values

#### Let's plot the actual and predicted values
plot(testDF$medv,predictions,col='blue',main='Real vs Predicted',pch=1,cex=0.9,type = "p",xlab = "Actual",ylab = "Predicted")

#### A line with 45 degree slope is showing that predictions and actual values 
#### of unseen data i.e test data set are almost the same


####For more better training of model for any dirty data 
####preprocessing and cleaning of data is must
####crossvalidation is also must