Here you will learn the easiest way to plot a histogram in Python. We make use of the seaborn library to create the distribution.
1. What is a Histogram?
A histogram is a graphical representation of data that is used to show the distribution of a set of continuous or discrete values. It is a type of bar graph that is used to represent the frequency or count of the values that fall within a set of intervals or “bins”. The bins are represented by bars on the x-axis and the frequency of the values within each bin is represented by the height of the bar on the y-axis.
Histograms are commonly used in many fields, including statistics, data analysis, and machine learning, to visualize the distribution of a set of values and identify patterns or trends in the data. Some of the advantages of using histograms include:
- Easy to understand: Histograms are simple to understand and interpret, even for people without a strong background in statistics or data analysis. They can be used to quickly visualize the distribution of data and identify outliers or unusual values.
- Shows distribution: Histograms show the distribution of the data and help to identify the shape of the distribution (e.g. normal, uniform, skewed).
- Identifies patterns: Histograms can help to identify patterns in the data, such as skewness or bimodality.
- Compares data: Histograms can be used to compare the distribution of two or more sets of data, which can help to identify differences or similarities in the data.
- Effective with large datasets: Histograms can be used to effectively visualize and analyze large datasets. By aggregating the data into bins, histograms can provide a summary of the distribution of the data and help to identify patterns that may not be obvious in the raw data.
Overall, histograms are a useful tool for visualizing and understanding the distribution of data. They can be used to quickly identify patterns and trends in the data and provide a visual representation of the distribution that is easy to understand and interpret.
If you want to learn more about histograms, please visit the following youtube video:
2. Generating the Distribution Plot in Python
The function is straightforward! All it asks you is for five parameters:
- A list of the data
- A string with the x-label
- A string with the y-label
- A string with the title
- An integer with the number of bins
- A string with the plot output path
Please see the code below:
import seaborn import matplotlib.pyplot as matplotlib import numpy as np def plot_dist(data, x_label, y_label, tittle, number_bins, plot_output): """Plot data distribution Args: data (list): List with data to be plotted. x_label (str): Plot x-label. y_label (str): Plot y-label. tittle (str): Plot tittle. number_bins (int): Number of bins for distribution. plot_output (str): Path to plot output. """ seaborn.set(color_codes=True) matplotlib.figure(1, figsize=(9, 6)) sns_plot = seaborn.distplot(data, kde=False, rug=False, bins=number_bins) sns_plot.set(xlabel=x_label, ylabel=y_label) matplotlib.title(tittle) sns_plot.figure.savefig(plot_output, bbox_inches='tight', dpi=400) matplotlib.close() # set seed for same plot can be re-generated on example presented here using np.random.normal np.random.seed(11) # random sample data = np.random.normal(0.1, 0.5, 5000) plot_dist(data, "Random Variable", "# Frequency", "Random Variable sampled 5000 times", 200, "plot_histogram_python_example.png")
And here is how our plot looks like
If you enjoyed this tutorial and would love to learn about box plots and plot them in Python, please check out the following tutorial.