**Introduction**

This is for you,if you are looking for Deviance,AIC,Degree of Freedom,interpretation of p-value,coefficient estimates,odds ratio,logit score and how to find the final probability from logit score in logistic regression in R.

**Course for Beginners:**

**Course for Beginners:**

**https://www.udemy.com/machine-learning-using-r/?couponCode=DISFOR123**

**Importing libraries & Reading Data**

Importing the required libraries.MASS is used for importing birthwt dataset

```
library(MASS)
#### Storing the data set named "birthwt" into DataFrame
DataFrame <- birthwt
#### To read about the dataset use following command by uncommenting
#### help("birthwt")
#### Check first 3 rows
head(DataFrame,3)
```

```
## low age lwt race smoke ptl ht ui ftv bwt
## 85 0 19 182 2 0 0 0 1 0 2523
## 86 0 33 155 3 0 0 0 0 3 2551
## 87 0 20 105 1 1 0 0 0 1 2557
```

**Model Fitting & Model Summary**

Now we will fit the logistic regression model using only two continuous variables as independent variables i.e age and lwt.

```
#### Fitting the model
LogisticModel<- glm(low ~ age+lwt, data = DataFrame,family=binomial (link="logit"))
#### Let's check the summary of the model
summary(LogisticModel)
```

```
##
## Call:
## glm(formula = low ~ age + lwt, family = binomial(link = "logit"),
## data = DataFrame)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1352 -0.9088 -0.7480 1.3392 2.0595
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.748773 0.997097 1.754 0.0795 .
## age -0.039788 0.032287 -1.232 0.2178
## lwt -0.012775 0.006211 -2.057 0.0397 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 227.12 on 186 degrees of freedom
## AIC: 233.12
##
## Number of Fisher Scoring iterations: 4
```

**What is Deviance?**

It is a measure of goodness of fit of a generalized linear model.Higher the deviance value,poorer is the model fit.Now we will discuss point wise about the summary

**1. Null deviance**

The summary of the model says:

Null deviance: 234.67 on 188 degrees of freedom

When the model includes only intercept term,then the performance of the model is governed by null deviance.

**2. Residual deviance**

The summary of the model says:

Residual deviance: 227.12 on 186 degrees of freedom

When the model has included age and lwt variable,then the deviance is residual deviance which is lower(227.12) than null deviance(234.67).Lower value of residual deviance points out that the model has become better when it has included two variables (age and lwt)

**3. Degree of Freedom:**

The summary in the output says:

Null deviance: 234.67 on 188 degrees of freedom

The degrees of freedom for null deviance equals N−1, where N is the number of observations in data sample.Here N=189,therefore N-1=189-1=188

The summary in the output says:

Residual deviance: 227.12 on 186 degrees of freedom

The degrees of freedom for residual deviance equals N−k−1, where k is the number of variables and N is the number of observations in data sample.Here N=189,k=2 ,therefore N-k-1=189-2-1=186

Degrees of freedom associated with null and residual deviance differs by only two(188-186) as the model has only two variables(age and lwt), only two additional parameter has been estimated and therefore only two additional degree of freedom has been consumed.

**4. AIC: **

The summary in the output says:

AIC: 233.12

Its full form is Akaike Information Criterion (AIC). This is useful when we have more than one model to compare the goodness of fit of the models.It is a maximum likelihood estimate which penalizes to prevent overfitting. It measures flexibility of the models.Its analogous to adjusted R2 in multiple linear regression where it tries to prevent you from including irrelevant predictor variables.Lower AIC of model is better than the model having higher AIC.

**5. Fisher Scoring**

The summary in the output says:

Number of Fisher Scoring iterations: 4

Closed form equations can be used for solving for linear model paramters but that cannot be used for logistic regression.

An iterative approach known as Newton-Raphson algorithm is used for this.Fisher’s scoring algorithm is a derivative of Newton’s method for solving maximum likelihood problems numerically.

It tells how the model was estimated. The algorithm looks around to see if the fit would be improved by using different estimates. If it improves then it moves in that direction and then fits the model again. The algorithm stops when no significant additional improvement can be done. “Number of Fisher Scoring iterations” tells “how many iterations this algorithm run before it stopped”.Here it is 4.

**Basic Maths of Logistic Regression**

Let’s check the basic terms used in logistic regression and then try to find the

probability of getting “low=1” (i.e proabability of getting success)

**Formula for Odds ratio:**

Odds ratio =probability of success(p)/ probability of failure

=probability of (target variable=1)/probability of (target variable=0)

=p/(1-p)

**Formula for logit score:**

logit(p) = log(p/(1-p))= b0 + b1*x1 + … + bk*xk

**Now let’s follow the following points to find the final probability of (target variable=1 or low=1) from logit score:**

1. Intercept Coefficient(b0)=1.748773

2. lwt coefficient(b1) =-0.012775

**Interpretation**: The increase in logit score per unit increase in weight(lwt)

is -0.012775

age coefficient(b2) =-0.039788

**Interpretation:** The increase in logit score per unit increase in age

is -0.039788

3. p-value for lwt variable=0.0397

Interpretation:According to z-test,p-value is 0.0397 which is comparatively low

which implies its unlikely that there is “no relation” between lwt and target variable i.e low variable .Star(*) next to p-value in the summary shows that lwt is significant variable in predicting low variable.

4. p-value for age=0.2178

Interpretation:According to z-test,p-value is 0.2178 which is comparatively high which implies its unlikely that there is “any relation” between age and target variable i.e low.

5. Let’s consider a random person with age =25 and lwt=55.Now let’s find the logit score for this person

b0 + b1*x1 + b2*x2= 1.748773-0.039788*25-0.012775*55=0.05144(approx).

6. So logit score for this observation=0.05144

7. Now let’s find the probability that birthwt <2.5 kg(i.e low=1).See the help page on birthwt data set (type ?birthwt in the console)

8.Odds value=exp(0.05144) =1.052786

probability(p) = odds value / odds value + 1

p=1.052786/2.052786=0.513(approx.)

9.p=0.513

Interpretation:0.513 or 51.3% is the probability of birth weight less than 2.5 kg when the mother age =25 and mother’s weight(in pounds)=55

**Follow the link below if you are interested in full descriptive online paid course on data science and machine learning**

Machine Learning and Data Science best online courses

**Course for Beginners:**

**Course for Beginners:**