Pain-free Genome Depth Plotting

onestop_databy:

BioinformaticsData visualization

A genome depth plot is a great way to visualize how well a target genome is covered across the genome. This can be used to eyeball if a certain level of coverage was met across the genome or even to identify contamination when looking into metagenomic datasets.

Here I share a python script to generate the depth plot. Enjoy it!

Dependencies

First and foremost, the script presented here relies on a few dependencies:

Numpy and Matplotlib are also dependencies, but they are installed when installing Seaborn.

Generating Genome Depth file

Secondly, we are targeting a 1 single contig genome as a reference and map some reads to use using BWA. Therefore, you can use the command lines below to format the reference genome and align the reads to the genome.

$ bwa index {REFERENCE_GENOME.FASTA}
$ bwa mem  -t 16 {REFERENCE_GENOME.FASTA} {READS_INPUT} | samtools sort -o {ALIGNMENT}.bam

Now that we have the alignments from our reads to a target genomes, we can use samtools to generate the depth report

$ samtools depth {ALIGNMENT}.bam > {GENOME}.depth

Ps: Please keep in mind that samtools depth contig start count from position 1 rather than 0.

Python Script

Last but not least, now that you have the samtool’s generated depth file, we can generate the visual of the depth.

# !/usr/bin/env python3
# -*- coding: utf-8 -*-

import seaborn as sns
import numpy as np

from matplotlib import (pyplot as plt,
                        lines)


def parse_depth(depth_input, genome_size):
    """Parse depth file.

    Args:
        depth_input (str): Path to depth file.
        genome_size (int): Genome size.

    Returns:
        list: List with depth.

    """
    depth = [0] * genome_size
    references = set()

    with open(depth_input) as depth_object:
        for row in depth_object:
            genome_id, position, depth_count = row.split()

            references.add(genome_id)

            if len(references) > 1:
                raise Exception(' This script only handles one genome - contig.')

            depth[int(position)] = int(depth_count)

    return depth


def plot_depth(depth_report, output_name, plot_title, genome_size, normalize=False, depth_cut_off=20):
    """Plot genome Depth across genome.

    Args:
        depth_report (str): Path to samtool's depth file.
        output_name (str): Path to output PNG image.
        plot_title (str): Plot title.
        genome_size (int): Genome size.
        normalize (bool): If `True`, normalizes the depth by the largest depth (default = `False`).
        depth_cut_off (int): Plot a line to represent a targeted depth (default = 20).

    """
    data = parse_depth(depth_report, genome_size)

    y_label = "Normalized Depth" if normalize else "Depth"
    data = [xx / max(data) for xx in data] if normalize else data

    sns.set(color_codes=True)
    plt.title(plot_title)
    ax = plt.subplot(111)

    sns_plot = sns.lineplot(x=range(len(data)), y=data)
    sns_plot.set(xlabel='Genome Position (bp)', ylabel=y_label)

    if not normalize:
        ax.add_line(lines.Line2D([0, genome_size + 1], [depth_cut_off], color="r"))

    plt.savefig(output_name, bbox_inches='tight', dpi=400)
    plt.close()

    print("Done :)")

plot_depth is the function that should be called, and it takes as input the following parameters:

  • depth_report (str): Path to samtool’s depth file.
  • output_name (str): Path to an output PNG image.
  • plot_title (str): Plot title.
  • genome_size (int): Genome size.
  • normalize (bool): If True, normalizes the depth by the largest depth (default = False).
  • depth_cut_off (int): Plot a line to represent a targeted depth (default = 20).

Now that we know what we need to input into the script and luckily we have what we need, we can call the function. Below it calls the function to plot the normalized and raw depth with depth marker at 40x.

>>> import plot_depth

# raw depth
>>> plot_depth("chlamydia_phage.depth", "depth_normalized.png", "Guinea pig Chlamydia phage", 4529, False, 40)

# normalize depth
>>> plot_depth("chlamydia_phage.depth", "depth_raw.png", "Guinea pig Chlamydia phage", 4529, True, 40)

Here is how the normalized plot looks like

And here is the raw depth plot

More Resources

Here are two of my favorite Data Visualization Python Books in case you want to learn more about it.

Related Posts