 Histogram

Introduction

Importing libraries

Let’s import the ggplot2 library which is needed for ggplot visualization

library(ggplot2)

Let’s import the data set named “diamonds” into the data frame named “DataFrame”

DataFrame<-diamonds

Looking at data

Let’s check the str of the data set

str(DataFrame)
## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
##  \$ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  \$ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  \$ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  \$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  \$ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  \$ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  \$ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  \$ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  \$ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  \$ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Histogram using ggplot

Let’s choose the carat variable in the dataset.It is continuous variable
and let’s visualize it in histogram

ggplot(data = DataFrame)+geom_histogram(aes(x =carat),
color="orange",
fill="blue",
alpha=0.5,
binwidth=0.5)+
scale_x_continuous(breaks = 1:10)+
scale_y_continuous(name="carat frequency distribution")+
theme_bw()

Meaning of arguments in ggplot function:

Above is the format which you can use for any histogram visualization
Meanings of some parameters and functions are as follows:

Functions used are :

• ggplot() is basic function which is used in every visualization
• This takes in data argument which is the name of dataframe
• geom_histogram is used for plotting histogram.This takes in aes() function
and other arguments
• scale_x_continous is used for customizing the x axis
• theme_bw() is used for customizing the plot background

Parameters used are :

• fill=used for color used in filling the rectangular boxes of histogram
• color=used for color of edges of histogram
• alpha=used for transparency.Very useful when you want to plot one
over other
• binwidth=choosing the width for histogram bins