Plotting Categorical Variable vs continuous variables

Let’s begin Data visualizations from basic to more advanced levels where we can learn about plotting categorical variable vs continuous variable or categorical vs categorical variables.Let’s start RStudio and begin typing in 🙂

For Best Course on Data Science Developed by Data Scientist ,please follow the below link to avail discount

https://www.udemy.com/machine-learning-using-r/?couponCode=DISFOR123

####  Let's store the dataset Arthritis dataset into trainDF
library(vcd)
## Warning: package 'vcd' was built under R version 3.2.3
trainDF<-Arthritis

#### To check about the Arthritis data set uncomment the following and run
#### ?Arthritis

#### To check any of the function you need to put ? 
#### Say you want to know about head() function then simply run ?head()


####  Let's have a look on the data set
head(trainDF)
##   ID Treatment  Sex Age Improved
## 1 57   Treated Male  27     Some
## 2 46   Treated Male  29     None
## 3 77   Treated Male  30     None
## 4 17   Treated Male  32   Marked
## 5 36   Treated Male  46   Marked
## 6 23   Treated Male  58   Marked

For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses
best online Machine learning and Data science courses

#### Let's check the str of dataset
str(trainDF)
## 'data.frame':    84 obs. of  5 variables:
##  $ ID       : int  57 46 77 17 36 23 75 39 33 55 ...
##  $ Treatment: Factor w/ 2 levels "Placebo","Treated": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Sex      : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Age      : int  27 29 30 32 46 58 59 59 63 63 ...
##  $ Improved : Ord.factor w/ 3 levels "None"<"Some"<..: 2 1 1 3 3 3 1 3 1 1 ...
#### Let's check the number of unique values in each column
sapply(trainDF,function(x) length(unique(x)))
##        ID Treatment       Sex       Age  Improved 
##        84         2         2        36         3
#### Lets check the histogram for distribution of id variable
hist(trainDF$ID,xlab = "ID",breaks = 10,col = colors()[100:109],
     main = "Histogram of ID variable")
histogram
histogram of ID variable

For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses
best online Machine learning and Data science courses

#### Seems like the ID variable is uniform and 
#### that should be because these are just Patient ID's

#### For data visualization check few things:
#1. Check if the Variable is numerical or categorical data 
#2. If a given variable is numerical check the number of unique values in it
#3. If the number of unique values is less then it might be categorical variable 
#4. Convert all categorical variable into factor variable using as.factor
#5. For numerical variable use histogram and boxplot
#6. for histogram use hist() function
#7. for boxplot use boxplot() function
#8. for other plots use plot() function
#9. Last but not the least in R there are many ways to do the same thing
#10. This is just basic data exploration
#11. For advanced visualization check the my post on ggplot2.

Here is a nice online descriptive course on data science at good price.

#### Let’s check the Age variable hist(trainDF$Age,breaks = 10,col = colors()[100:109], main = “Histogram of Age variable”, xlab=”Age”)


histogram
Histogram of Age Variable

#### Check the boxplot of age variable
boxplot(trainDF$Age,col = colors()[100:109],
        main = "Boxplot of Age variable",
        xlab="Age",
        ylab="Distribution of Age variable")
Boxplot
Boxplot of Age variable
#### Let's plot the barplot of Treatment variable
plot(trainDF$Treatment,main="Treatment Categorical Variable",
     col=colors()[100:102],
     xlab="Treatment")
Categorical Variable
barplot of Treatment Categorical Variable

For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses
best online Machine learning and Data science courses

#### colors() function is used for colours 

#### Visualize boxplot indicating difference in age between two groups of
#### treatments
plot(Age~Treatment,data = trainDF,col=colors()[100:102])
Boxplot
Boxplot Age vs treatment
#### Visualize difference in age between male and female groups
plot(Age~Sex,data = trainDF,col=colors()[100:102])
Boxplot
Boxplot Age vs Sex
#### Visualize difference in age between improved groups(This is a boxplot)
plot(Age~Improved,data = trainDF,col=colors()[100:102],
     ylab="Age groups",
     main="Age groups vs Improved")
boxplot
Boxplot Age vs Improved
#### Visualize relation between Sex group and Improved variable
plot(Sex~Improved,data = trainDF,col=colors()[100:102],
     main="Sex vs Improved")
barplot
barplot Sex vs Improved
#### Visualize difference in age between improved groups
#### cut function is used for converting numerical variable into factor
plot(cut(Age,3)~Improved,data = trainDF,col=colors()[100:102],
     ylab="Age groups",
     main="Age groups vs Improved")


Barplot
barplot Age vs Improved

#### Visualize age and sex
plot(cut(Age,3)~Sex,data = trainDF,col=colors()[100:102],
     ylab="Age groups",
     main="Age groups vs Sex")
barplot
barplot Sex vs Improved

For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses
best online Machine learning and Data science courses