The Easiest way to Plot a Histogram in Python – Step-by-Step

by:

Data AnalysisData visualization

Here you will learn the easiest way to plot a histogram in Python. We make use of the seaborn library to create the distribution.

1. What is a Histogram?

A histogram is a graphical representation of data that is used to show the distribution of a set of continuous or discrete values. It is a type of bar graph that is used to represent the frequency or count of the values that fall within a set of intervals or “bins”. The bins are represented by bars on the x-axis and the frequency of the values within each bin is represented by the height of the bar on the y-axis.

Histograms are commonly used in many fields, including statistics, data analysis, and machine learning, to visualize the distribution of a set of values and identify patterns or trends in the data. Some of the advantages of using histograms include:

  1. Easy to understand: Histograms are simple to understand and interpret, even for people without a strong background in statistics or data analysis. They can be used to quickly visualize the distribution of data and identify outliers or unusual values.
  2. Shows distribution: Histograms show the distribution of the data and help to identify the shape of the distribution (e.g. normal, uniform, skewed).
  3. Identifies patterns: Histograms can help to identify patterns in the data, such as skewness or bimodality.
  4. Compares data: Histograms can be used to compare the distribution of two or more sets of data, which can help to identify differences or similarities in the data.
  5. Effective with large datasets: Histograms can be used to effectively visualize and analyze large datasets. By aggregating the data into bins, histograms can provide a summary of the distribution of the data and help to identify patterns that may not be obvious in the raw data.

Overall, histograms are a useful tool for visualizing and understanding the distribution of data. They can be used to quickly identify patterns and trends in the data and provide a visual representation of the distribution that is easy to understand and interpret.

If you want to learn more about histograms, please visit the following youtube video:

2. Generating the Distribution Plot in Python

Below you can find the code for the density plot in Python. It requires you to install Seaborn and Numpy.

The function is straightforward! All it asks you is for five parameters:

  • A list of the data
  • A string with the x-label
  • A string with the y-label
  • A string with the title
  • An integer with the number of bins
  • A string with the plot output path

Please see the code below:

import seaborn
import matplotlib.pyplot as matplotlib
import numpy as np

def plot_dist(data, x_label, y_label, tittle, number_bins, plot_output):
    """Plot data distribution

    Args:
        data (list): List with data to be plotted.
        x_label (str): Plot x-label.
        y_label (str): Plot y-label.
        tittle (str): Plot tittle.
        number_bins (int): Number of bins for distribution.
        plot_output (str): Path to plot output.

    """
    seaborn.set(color_codes=True)
    matplotlib.figure(1, figsize=(9, 6))

    sns_plot = seaborn.distplot(data, kde=False, rug=False, bins=number_bins)
    sns_plot.set(xlabel=x_label, ylabel=y_label)
    matplotlib.title(tittle)

    sns_plot.figure.savefig(plot_output, bbox_inches='tight', dpi=400)

    matplotlib.close()


# set seed for same plot can be re-generated on example presented here using np.random.normal
np.random.seed(11)

# random sample
data = np.random.normal(0.1, 0.5, 5000)

plot_dist(data, "Random Variable", "# Frequency", "Random Variable sampled 5000 times", 200, "plot_histogram_python_example.png")

And here is how our plot looks like

An example of a histogram plot in Python

If you enjoyed this tutorial and would love to learn about box plots and plot them in Python, please check out the following tutorial.

3. More Resources