January 14, 2017 ## For Best Course on Data Science Developed by Data Scientist ,please follow the below link to avail discount

https://www.udemy.com/machine-learning-using-r/?couponCode=DISFOR123

``````####  Let's store the dataset Arthritis dataset into trainDF
library(vcd)
``````
``````## Warning: package 'vcd' was built under R version 3.2.3
``````
``````trainDF<-Arthritis

#### To check about the Arthritis data set uncomment the following and run
#### ?Arthritis

#### To check any of the function you need to put ?

####  Let's have a look on the data set
``````
``````##   ID Treatment  Sex Age Improved
## 1 57   Treated Male  27     Some
## 2 46   Treated Male  29     None
## 3 77   Treated Male  30     None
## 4 17   Treated Male  32   Marked
## 5 36   Treated Male  46   Marked
## 6 23   Treated Male  58   Marked
``````

For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses
best online Machine learning and Data science courses

``````#### Let's check the str of dataset
str(trainDF)
``````
``````## 'data.frame':    84 obs. of  5 variables:
##  \$ ID       : int  57 46 77 17 36 23 75 39 33 55 ...
##  \$ Treatment: Factor w/ 2 levels "Placebo","Treated": 2 2 2 2 2 2 2 2 2 2 ...
##  \$ Sex      : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
##  \$ Age      : int  27 29 30 32 46 58 59 59 63 63 ...
##  \$ Improved : Ord.factor w/ 3 levels "None"<"Some"<..: 2 1 1 3 3 3 1 3 1 1 ...
``````
``````#### Let's check the number of unique values in each column
sapply(trainDF,function(x) length(unique(x)))
``````
``````##        ID Treatment       Sex       Age  Improved
##        84         2         2        36         3
``````
``````#### Lets check the histogram for distribution of id variable
hist(trainDF\$ID,xlab = "ID",breaks = 10,col = colors()[100:109],
main = "Histogram of ID variable")
``````

For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses
best online Machine learning and Data science courses

``````#### Seems like the ID variable is uniform and
#### that should be because these are just Patient ID's

#### For data visualization check few things:
#1. Check if the Variable is numerical or categorical data
#2. If a given variable is numerical check the number of unique values in it
#3. If the number of unique values is less then it might be categorical variable
#4. Convert all categorical variable into factor variable using as.factor
#5. For numerical variable use histogram and boxplot
#6. for histogram use hist() function
#7. for boxplot use boxplot() function
#8. for other plots use plot() function
#9. Last but not the least in R there are many ways to do the same thing
#10. This is just basic data exploration
#11. For advanced visualization check the my post on ggplot2.``````

## Here is a nice online descriptive course on data science at good price.  #### Let’s check the Age variable hist(trainDF\$Age,breaks = 10,col = colors()[100:109], main = “Histogram of Age variable”, xlab=”Age”)

``````#### Check the boxplot of age variable
boxplot(trainDF\$Age,col = colors()[100:109],
main = "Boxplot of Age variable",
xlab="Age",
ylab="Distribution of Age variable")
``````
``````#### Let's plot the barplot of Treatment variable
plot(trainDF\$Treatment,main="Treatment Categorical Variable",
col=colors()[100:102],
xlab="Treatment")
``````

For Data Science Beginners:Follow the link below for best online Machine learning and Data science courses
best online Machine learning and Data science courses

``````#### colors() function is used for colours

#### Visualize boxplot indicating difference in age between two groups of
#### treatments
plot(Age~Treatment,data = trainDF,col=colors()[100:102])
``````
``````#### Visualize difference in age between male and female groups
plot(Age~Sex,data = trainDF,col=colors()[100:102])
``````
``````#### Visualize difference in age between improved groups(This is a boxplot)
plot(Age~Improved,data = trainDF,col=colors()[100:102],
ylab="Age groups",
main="Age groups vs Improved")
``````
``````#### Visualize relation between Sex group and Improved variable
plot(Sex~Improved,data = trainDF,col=colors()[100:102],
main="Sex vs Improved")
``````
``````#### Visualize difference in age between improved groups
#### cut function is used for converting numerical variable into factor
plot(cut(Age,3)~Improved,data = trainDF,col=colors()[100:102],
ylab="Age groups",
main="Age groups vs Improved")
``````

``````#### Visualize age and sex