A genome depth plot is a great way to visualize how well a target genome is covered across the genome. This can be used to eyeball if a certain level of coverage was met across the genome or even to identify contamination when looking into metagenomic datasets.
Here I share a python script to generate the depth plot. Enjoy it!
Dependencies
First and foremost, the script presented here relies on a few dependencies:
- Seaborn: Image plotting library
- BWA: Aligner to generate BAM alignments
- Samtools: Tool to generate BAM and depth file
Numpy and Matplotlib are also dependencies, but they are installed when installing Seaborn.
Generating Genome Depth file
Secondly, we are targeting a 1 single contig genome as a reference and map some reads to use using BWA. Therefore, you can use the command lines below to format the reference genome and align the reads to the genome.
$ bwa index {REFERENCE_GENOME.FASTA}
$ bwa mem -t 16 {REFERENCE_GENOME.FASTA} {READS_INPUT} | samtools sort -o {ALIGNMENT}.bam
Now that we have the alignments from our reads to a target genomes, we can use samtools to generate the depth report
$ samtools depth {ALIGNMENT}.bam > {GENOME}.depth
Ps: Please keep in mind that samtools depth contig start count from position 1 rather than 0.
Python Script
Last but not least, now that you have the samtool’s generated depth file, we can generate the visual of the depth.
# !/usr/bin/env python3
# -*- coding: utf-8 -*-
import seaborn as sns
import numpy as np
from matplotlib import (pyplot as plt,
lines)
def parse_depth(depth_input, genome_size):
"""Parse depth file.
Args:
depth_input (str): Path to depth file.
genome_size (int): Genome size.
Returns:
list: List with depth.
"""
depth = [0] * genome_size
references = set()
with open(depth_input) as depth_object:
for row in depth_object:
genome_id, position, depth_count = row.split()
references.add(genome_id)
if len(references) > 1:
raise Exception(' This script only handles one genome - contig.')
depth[int(position)] = int(depth_count)
return depth
def plot_depth(depth_report, output_name, plot_title, genome_size, normalize=False, depth_cut_off=20):
"""Plot genome Depth across genome.
Args:
depth_report (str): Path to samtool's depth file.
output_name (str): Path to output PNG image.
plot_title (str): Plot title.
genome_size (int): Genome size.
normalize (bool): If `True`, normalizes the depth by the largest depth (default = `False`).
depth_cut_off (int): Plot a line to represent a targeted depth (default = 20).
"""
data = parse_depth(depth_report, genome_size)
y_label = "Normalized Depth" if normalize else "Depth"
data = [xx / max(data) for xx in data] if normalize else data
sns.set(color_codes=True)
plt.title(plot_title)
ax = plt.subplot(111)
sns_plot = sns.lineplot(x=range(len(data)), y=data)
sns_plot.set(xlabel='Genome Position (bp)', ylabel=y_label)
if not normalize:
ax.add_line(lines.Line2D([0, genome_size + 1], [depth_cut_off], color="r"))
plt.savefig(output_name, bbox_inches='tight', dpi=400)
plt.close()
print("Done :)")
plot_depth is the function that should be called, and it takes as input the following parameters:
- depth_report (str): Path to samtool’s depth file.
- output_name (str): Path to an output PNG image.
- plot_title (str): Plot title.
- genome_size (int): Genome size.
- normalize (bool): If True, normalizes the depth by the largest depth (default = False).
- depth_cut_off (int): Plot a line to represent a targeted depth (default = 20).
Now that we know what we need to input into the script and luckily we have what we need, we can call the function. Below it calls the function to plot the normalized and raw depth with depth marker at 40x.
>>> import plot_depth
# raw depth
>>> plot_depth("chlamydia_phage.depth", "depth_normalized.png", "Guinea pig Chlamydia phage", 4529, False, 40)
# normalize depth
>>> plot_depth("chlamydia_phage.depth", "depth_raw.png", "Guinea pig Chlamydia phage", 4529, True, 40)
Here is how the normalized plot looks like

And here is the raw depth plot

More Resources
Here are two of my favorite Data Visualization Python Books in case you want to learn more about it.
- Mastering Python Data Visualization by Kirthi Raman
- Python Data Visualization: An Easy Introduction to Data Visualization in Python with Matplotlip, Pandas, and Seaborn by Samuel Burns