Histogram for data analytics in R
Histogram

Introduction

Let’s begin learning about how to plot histogram in R using ggplot2

Importing libraries

Let’s import the ggplot2 library which is needed for ggplot visualization

library(ggplot2)

Reading data set

Let’s import the data set named “diamonds” into the data frame named “DataFrame”

DataFrame<-diamonds

Looking at data 

Let’s check the str of the data set

str(DataFrame)
## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Histogram using ggplot 

Let’s choose the carat variable in the dataset.It is continuous variable
and let’s visualize it in histogram

ggplot(data = DataFrame)+geom_histogram(aes(x =carat),
                                   color="orange",
                                   fill="blue",
                                   alpha=0.5,
                                   binwidth=0.5)+
  scale_x_continuous(breaks = 1:10)+
  scale_y_continuous(name="carat frequency distribution")+
  theme_bw()
Histogram for data analytics in R
Histogram

Meaning of arguments in ggplot function:

Above is the format which you can use for any histogram visualization
Meanings of some parameters and functions are as follows:

Functions used are :

  • ggplot() is basic function which is used in every visualization
  • This takes in data argument which is the name of dataframe
  • geom_histogram is used for plotting histogram.This takes in aes() function
    and other arguments
  • scale_x_continous is used for customizing the x axis
  • theme_bw() is used for customizing the plot background


Parameters used are :

  • fill=used for color used in filling the rectangular boxes of histogram
  • color=used for color of edges of histogram
  • alpha=used for transparency.Very useful when you want to plot one
    over other
  • binwidth=choosing the width for histogram bins