##### Let’s begin Data visualizations from basic to more advanced levels where we can learn about plotting categorical variable vs continuous variable or categorical vs categorical variables.Let’s start RStudio and begin typing in 🙂

## For Best Course on Data Science Developed by Data Scientist ,please follow the below link to avail discount

**https://www.udemy.com/machine-learning-using-r/?couponCode=DISFOR123**

```
#### Let's store the dataset Arthritis dataset into trainDF
library(vcd)
```

```
## Warning: package 'vcd' was built under R version 3.2.3
```

```
trainDF<-Arthritis
#### To check about the Arthritis data set uncomment the following and run
#### ?Arthritis
#### To check any of the function you need to put ?
#### Say you want to know about head() function then simply run ?head()
#### Let's have a look on the data set
head(trainDF)
```

```
## ID Treatment Sex Age Improved
## 1 57 Treated Male 27 Some
## 2 46 Treated Male 29 None
## 3 77 Treated Male 30 None
## 4 17 Treated Male 32 Marked
## 5 36 Treated Male 46 Marked
## 6 23 Treated Male 58 Marked
```

**For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses**

best online Machine learning and Data science courses

```
#### Let's check the str of dataset
str(trainDF)
```

```
## 'data.frame': 84 obs. of 5 variables:
## $ ID : int 57 46 77 17 36 23 75 39 33 55 ...
## $ Treatment: Factor w/ 2 levels "Placebo","Treated": 2 2 2 2 2 2 2 2 2 2 ...
## $ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
## $ Age : int 27 29 30 32 46 58 59 59 63 63 ...
## $ Improved : Ord.factor w/ 3 levels "None"<"Some"<..: 2 1 1 3 3 3 1 3 1 1 ...
```

```
#### Let's check the number of unique values in each column
sapply(trainDF,function(x) length(unique(x)))
```

```
## ID Treatment Sex Age Improved
## 84 2 2 36 3
```

```
#### Lets check the histogram for distribution of id variable
hist(trainDF$ID,xlab = "ID",breaks = 10,col = colors()[100:109],
main = "Histogram of ID variable")
```

**For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses**

best online Machine learning and Data science courses

```
#### Seems like the ID variable is uniform and
#### that should be because these are just Patient ID's
#### For data visualization check few things:
#1. Check if the Variable is numerical or categorical data
#2. If a given variable is numerical check the number of unique values in it
#3. If the number of unique values is less then it might be categorical variable
#4. Convert all categorical variable into factor variable using as.factor
#5. For numerical variable use histogram and boxplot
#6. for histogram use hist() function
#7. for boxplot use boxplot() function
#8. for other plots use plot() function
#9. Last but not the least in R there are many ways to do the same thing
#10. This is just basic data exploration
#11. For advanced visualization check the my post on ggplot2.
```

**Here is a nice online descriptive course on data science at good price.**

#### Let’s check the Age variable hist(trainDF$Age,breaks = 10,col = colors()[100:109], main = “Histogram of Age variable”, xlab=”Age”)

```
#### Check the boxplot of age variable
boxplot(trainDF$Age,col = colors()[100:109],
main = "Boxplot of Age variable",
xlab="Age",
ylab="Distribution of Age variable")
```

```
#### Let's plot the barplot of Treatment variable
plot(trainDF$Treatment,main="Treatment Categorical Variable",
col=colors()[100:102],
xlab="Treatment")
```

**For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses**

best online Machine learning and Data science courses

```
#### colors() function is used for colours
#### Visualize boxplot indicating difference in age between two groups of
#### treatments
plot(Age~Treatment,data = trainDF,col=colors()[100:102])
```

```
#### Visualize difference in age between male and female groups
plot(Age~Sex,data = trainDF,col=colors()[100:102])
```

```
#### Visualize difference in age between improved groups(This is a boxplot)
plot(Age~Improved,data = trainDF,col=colors()[100:102],
ylab="Age groups",
main="Age groups vs Improved")
```

```
#### Visualize relation between Sex group and Improved variable
plot(Sex~Improved,data = trainDF,col=colors()[100:102],
main="Sex vs Improved")
```

```
#### Visualize difference in age between improved groups
#### cut function is used for converting numerical variable into factor
plot(cut(Age,3)~Improved,data = trainDF,col=colors()[100:102],
ylab="Age groups",
main="Age groups vs Improved")
```

```
#### Visualize age and sex
plot(cut(Age,3)~Sex,data = trainDF,col=colors()[100:102],
ylab="Age groups",
main="Age groups vs Sex")
```

best online Machine learning and Data science courses