Boxplot in R

Introduction

Let’s begin learning about how to plot boxplot in R using ggplot2

For Best Course on Data Science Developed by Data Scientist ,please follow the below link to avail discount

https://www.udemy.com/machine-learning-using-r/?couponCode=DISFOR123

Importing libraries

Let’s import the ggplot2 library which is needed for ggplot visualization

library(ggplot2)

Reading data set

Let’s import the data set named “diamonds” into the dataframe named DataFrame

DataFrame<-diamonds

Looking at data 

Let’s check the str of the data set

str(DataFrame)
## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Boxplot using ggplot 

Let’s choose the cut variable and price variable in the dataset.

  • Cut is categorical or factor
  • Price variable is continuous variable
  • Let’s visualize cut and price variable together in boxplot
ggplot(data = DataFrame)+geom_boxplot(aes(x =cut,y=price),
                                   color="orange",
                                   fill="blue",
                                   alpha=0.5
                   )+
  scale_x_discrete()+
  scale_y_continuous(name="Price range in each cut cateogory")+
  theme_bw()
Boxplot in R
Boxplot

Meaning of arguments in ggplot function:

Above is the format which you can use for any boxplot visualization.
Meanings of some parameters and functions are as follows:

Functions used are :

  • ggplot() is basic function which is used in every visualization
  • This takes in data argument which is the name of dataframe
  • geom_boxplot is used for plotting boxplot.This takes in aes() function
    and other arguments
  • scale_x_discrete is used for customizing the x axis(categorical or discrete variable)
  • theme_bw() is used for customizing the plot background


Parameters used are :

  • fill=used for colour used in filling the reactangular boxes of boxplot
  • color=used for colour of edges of boxplot,median value and outliers
  • alpha=used for transparency.Very useful when you want to plot one over other
  • x= name of x variable which is categorical
  • y= name of y variable which is continuous