Generating a Scatter Plot in Python is probably one of the most common plots, and it can easily be accomplished. This tutorial shows using how using Seaborn, we can accomplish it.
What is a Scatter Plot?
A scatter plot is a graphical representation of two or more variables. It is used to display the relationship between two continuous variables and to determine if there is any correlation between the two variables.
In a scatter plot, each data point is represented as a single dot on the plot. The x-axis represents one variable and the y-axis represents another variable. The position of the dot on the plot represents the values of the two variables for that data point.
Scatter plots are useful for several purposes, including:
- Identifying correlations: Scatter plots are a useful tool for identifying correlations between two variables. If there is a positive correlation between two variables, the dots in the scatter plot will tend to move upwards and to the right. If there is a negative correlation, the dots will tend to move downwards and to the left.
- Identifying outliers: Scatter plots can also be used to identify outliers, or data points that are significantly different from the other data points in the plot. Outliers can be important to identify because they may indicate errors in the data or important features that need to be further investigated.
- Visualizing distributions: Scatter plots can be used to visualize the distribution of data. By looking at the pattern of dots in the plot, you can determine if the data is spread out evenly or if it is concentrated in certain areas.
- Comparing variables: Scatter plots can be used to compare the relationships between different variables. By plotting multiple scatter plots on the same graph, you can quickly compare the relationships between the variables.
- Identifying trends: Scatter plots can also be used to identify trends in the data, such as increasing or decreasing patterns.
In summary, scatter plots are a useful tool for visualizing the relationship between two continuous variables. They can be used to identify correlations, outliers, distributions, trends, and to compare the relationships between different variables.
Dependencies
First and foremost, the function below has some dependencies around seaborn and matplotlib, so please make sure you install them.
This can be easily installed using pip. When installing seaborn, the matplotlib is automatically handled. Please see here how to install it.
Generating the Scatter Plot
Secondly, this post plots an example of a bacterial exponential growth where the x-axis is the “Time (Hours)” and the y-axis is the bacterial growth as the number of bacterial cells.
Please don’t assume that the growth function is real – this is just an example.
# !/usr/bin/env python3
# -*- coding: utf-8 -*-
import seaborn
import numpy as np
import matplotlib.pyplot as matplotlib
from matplotlib.lines import Line2D
def simple_scatter_plot(x_data, y_data, output_filename, title_name, x_axis_label, y_axis_label):
"""Simple scatter plot.
Args:
x_data (list): List with x-axis data.
y_data (list): List with y-axis data.
output_filename (str): Path to output image in PNG format.
title_name (int): Plot title.
x_axis_label (str): X-axis Label.
y_axis_label (str): Y-axis Label.
"""
seaborn.set(color_codes=True)
matplotlib.figure(1, figsize=(9, 6))
matplotlib.title(title_name)
ax = seaborn.scatterplot(x=x_data, y=y_data)
ax.set(xlabel=x_axis_label, ylabel=y_axis_label)
matplotlib.savefig(output_filename, bbox_inches='tight', dpi=300)
matplotlib.close()
The simple_scatter_plot function requires for a few parameters described below:
- x_data: List with x-axis data
- y_data: List with y-axis data
- output_filename: Path to output image in PNG format
- title_name: Plot title
- x_axis_label: X-axis Label
- y_axis_label: Y-axis Label
Last by not least, execute the function by adapting it to the inputs you need!
>>> import simple_scatter_plot
>>>
>>> time_hours = range(100)
>>> number_bacteria_hours = [xx ** 2 for xx in time_hours]
>>> simple_scatter_plot(time_hours, number_bacteria_hours, "my_scatter_plot.png", "Bacterial Exponential Function Example",
"Time (Hours)", "# Bacteria")
Here is how the Scatter Plot in Python should look like:
In short, this tutorial is just a simple recipe that can be adapted into a more complex scatter plot. Here are the seaborn docs on all the different parameters that scatter plot can take. Furthermore, please feel free to ask questions below – I’m more than happy to add your requests to this tutorial.
log log Plot Python
Sometimes is needed to transform your data to the log scale so it can be visualized better. If this is your case, you can the seaborn – matplotlib call to transform the x-axis and/or the y-axis.
# please add to the function simple_scatter_plot
# or to any code you have written with seaborn/matplotlib
# y-axis in log scale
ax.set(yscale="log")
# x-axis in log scale
ax.set(xscale="log")
More Resources
Here are two of my favorite Data Visualization Python Books in case you want to learn more about it.
- Mastering Python Data Visualization by Kirthi Raman
- Python Data Visualization: An Easy Introduction to Data Visualization in Python with Matplotlip, Pandas, and Seaborn by Samuel Burns
Conclusion
In summary, this tutorial showed you how to use seaborn to plot a simple scatter plot which can be easily expanded to a more complex version.
Moreover, let me know if you need help expanding it to multiple layers. I’m more than happy to help with it. Please leave a comment below if needed.