AIC and deviance
AIC and deviance

Introduction

This is for you,if you are looking for Deviance,AIC,Degree of Freedom,interpretation of p-value,coefficient estimates,odds ratio,logit score and how to find the final probability from logit score in logistic regression in R.

Course for Beginners:

https://www.udemy.com/machine-learning-using-r/?couponCode=DISFOR123

Importing libraries & Reading Data

Importing the required libraries.MASS is used for importing birthwt dataset

library(MASS)

####  Storing the data set named "birthwt" into DataFrame
DataFrame <- birthwt

####  To read about the dataset use following command by uncommenting
#### help("birthwt")

####  Check first 3 rows
head(DataFrame,3)




##    low age lwt race smoke ptl ht ui ftv  bwt
## 85   0  19 182    2     0   0  0  1   0 2523
## 86   0  33 155    3     0   0  0  0   3 2551
## 87   0  20 105    1     1   0  0  0   1 2557

Model Fitting & Model Summary

Now we will fit the logistic regression model using only two continuous variables as independent variables i.e age and lwt.

####  Fitting the model
LogisticModel<- glm(low ~ age+lwt, data = DataFrame,family=binomial (link="logit"))

#### Let's check the summary of the model
summary(LogisticModel)
## 
## Call:
## glm(formula = low ~ age + lwt, family = binomial(link = "logit"), 
##     data = DataFrame)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1352  -0.9088  -0.7480   1.3392   2.0595  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  1.748773   0.997097   1.754   0.0795 .
## age         -0.039788   0.032287  -1.232   0.2178  
## lwt         -0.012775   0.006211  -2.057   0.0397 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 227.12  on 186  degrees of freedom
## AIC: 233.12
## 
## Number of Fisher Scoring iterations: 4

 

What is Deviance?

It is a measure of goodness of fit of a generalized linear model.Higher the deviance value,poorer is the model fit.Now we will discuss point wise about the summary

1. Null deviance

The summary of the model says:
Null deviance: 234.67 on 188 degrees of freedom

When the model includes only intercept term,then the performance of the model is governed by null deviance.




2. Residual deviance

The summary of the model says:
Residual deviance: 227.12 on 186 degrees of freedom

When the model has included age and lwt variable,then the deviance is  residual deviance which is lower(227.12) than null deviance(234.67).Lower value of residual deviance points out that the model has become better when it has included two variables (age and lwt)




3. Degree of Freedom:

The summary in the output says:
Null deviance: 234.67 on 188 degrees of freedom

The degrees of freedom for null deviance equals N−1, where N is the number of observations in data sample.Here N=189,therefore N-1=189-1=188

The summary in the output says:
Residual deviance: 227.12 on 186 degrees of freedom

The degrees of freedom for residual deviance equals N−k−1, where k is the number of variables and N is the number of observations in data sample.Here N=189,k=2 ,therefore N-k-1=189-2-1=186

Degrees of freedom associated with null and residual deviance differs by only two(188-186) as the model has only two variables(age and lwt), only two additional parameter has been estimated and therefore only two additional degree of freedom has been consumed.

4. AIC:

The summary in the output says:
AIC: 233.12

Its full form is Akaike Information Criterion (AIC). This is useful when we have more than one model to compare the goodness of fit of the models.It is a maximum likelihood estimate which penalizes to prevent overfitting. It measures flexibility of the models.Its analogous to adjusted R2 in multiple linear regression where it tries to prevent you from including irrelevant predictor variables.Lower AIC of model is better than the model having higher AIC.

5. Fisher Scoring

The summary in the output says:
Number of Fisher Scoring iterations: 4

Closed form equations can be used for solving for linear model paramters but that cannot be used for logistic regression.

An iterative approach known as Newton-Raphson algorithm is used for this.Fisher’s scoring algorithm is a derivative of Newton’s method for solving maximum likelihood problems numerically.

It tells how the model was estimated. The algorithm looks around to see if the fit would be improved by using different estimates. If it improves then it moves in that direction and then fits the model again. The algorithm stops when no significant additional improvement can be done. “Number of Fisher Scoring iterations” tells “how many iterations this algorithm run before it stopped”.Here it is 4.

Basic Maths of Logistic Regression

Let’s check the basic terms used in logistic regression and then try to find the
probability of getting “low=1” (i.e proabability of getting success)

Formula for Odds ratio:

Odds ratio =probability of success(p)/ probability of failure
=probability of (target variable=1)/probability of (target variable=0)
=p/(1-p)

Formula for logit score:

logit(p) = log(p/(1-p))= b0 + b1*x1 + … + bk*xk

Now let’s follow the following points to find the final probability of (target variable=1 or low=1) from logit score:

1. Intercept Coefficient(b0)=1.748773
2. lwt coefficient(b1) =-0.012775
Interpretation: The increase in logit score per unit increase in weight(lwt)
is -0.012775
age coefficient(b2) =-0.039788




Interpretation: The increase in logit score per unit increase in age
is -0.039788

3. p-value for lwt variable=0.0397
Interpretation:According to z-test,p-value is 0.0397 which is comparatively low
which implies its unlikely that there is “no relation” between lwt and target variable i.e low variable .Star(*) next to p-value in the summary shows that lwt is significant variable in predicting low variable.




4. p-value for age=0.2178
Interpretation:According to z-test,p-value is 0.2178 which is comparatively high which implies its unlikely that there is “any relation” between age and target variable i.e low.

5. Let’s consider a random person with age =25 and lwt=55.Now let’s find the logit score for this person
b0 + b1*x1 + b2*x2= 1.748773-0.039788*25-0.012775*55=0.05144(approx).

6. So logit score for this observation=0.05144




7. Now let’s find the probability that birthwt <2.5 kg(i.e low=1).See the help page on birthwt data set (type ?birthwt in the console)

8.Odds value=exp(0.05144) =1.052786
probability(p) = odds value / odds value + 1
p=1.052786/2.052786=0.513(approx.)

9.p=0.513
Interpretation:0.513 or 51.3% is the probability of birth weight less than 2.5 kg when the mother age =25 and mother’s weight(in pounds)=55

Follow the link below if you are interested in full descriptive online paid course on data science and machine learning
Machine Learning and Data Science best online courses

Course for Beginners:

https://www.udemy.com/machine-learning-using-r/?couponCode=DISFOR123