Simple Box Plot and Swarm Plot in Python

onestop_databy:

Data visualizationPython

This short tutorial teaches you how to create a box and plot whisker and overlap it with a swarm plot in Python. Furthermore, it shows how to read it.

What is a Box Plot and how to Read it?

A boxplot is a simple way of representing your data distributions using the 5 summaries: Minimum point, 1st quartile (Q1), median, 3rd quartile (Q3), and maximum point. Moreover, it is easy to see the outliers in your data population and to visually compare multiple populations. 

Also, here we also plot a swarm plot that overlaps our box plot to show where exactly the data point sits in the 5 summaries of the boxplot.

Please see the video below for a complete explanation on the topic: Box plot how to read.

Dependencies for the script

First and foremost, the function below has some dependencies around seaborn and matplotlib, so please make sure you install them.

This can be easily installed using pip. When installing seaborn, the matplotlib is automatically handled. Please see here how to install it.

Generating the Box and Swarm Plot

Secondly, this post plots three random population distributions into a box and swarm plot. However, this can be easily modified to less or more populations.

Please see code below:

import matplotlib.pyplot as matplotlib
import numpy as np
import seaborn


def plot_box_swarm(data, y_axis_label, x_labels, plot_title, figure_name):
    """Plot box-plot and swarm plot for data list.

    Args:
        data (list of list): List of lists with data to be plotted.
        y_axis_label (str): Y- axis label.
        x_labels (list of str): List with labels of x-axis.
        plot_title (str): Plot title.
        figure_name (str): Path to output figure.
        
    """
    seaborn.set(color_codes=True)
    matplotlib.figure(1, figsize=(9, 6))

    # add title to plot
    matplotlib.title(plot_title)

    # plot data on swarmplot and boxplot
    seaborn.swarmplot(data=data, color=".25")
    ax = seaborn.boxplot(data=data)

    # y-axis label
    ax.set(ylabel=y_axis_label)

    # write labels with number of elements
    ax.set_xticklabels(["{} (n={})".format(l, len(data[x])) for x, l in enumerate(x_labels)], rotation=10)

    # write figure file with quality 400 dpi
    matplotlib.savefig(figure_name, bbox_inches='tight', dpi=400)
    matplotlib.close()


# set seed for same plot can be re-generated on example presented here using np.random.normal
np.random.seed(11)

# create random distributions for 3 populations
population_a = np.random.normal(0.1, 0.5, 50)
population_b = np.random.normal(0.2, 0.7, 45)
population_c = np.random.normal(0.7, 0.3, 51)
data = [population_a, population_b, population_c]

x_labels = ["Population A", "Population B", "Population C"]
y_axis_label = "Target Metric"

plot_box_swarm(data, y_axis_label, x_labels, "Box/Swarm plot - Population A vs B vs C", "pop_A_B_C.png")

And here is how our plot looks like

In case you only want to plot the swarm plot, please comment on line 25 of the code above. Or if you want to plot only the boxplot, please comment on line 24.

More Resources

Here are two of my favorite Data Visualization Python Books in case you want to learn more about it.

Conclusion

In summary, this tutorial showed you how to use seaborn to plot a box and swarm plot which is very useful to help you visualize data.

Related Posts