Effortless Recruitment Plot

onestop_databy:

BioinformaticsData visualization

Creating a recruitment plot can be a pain when you don’t have the right tool. A fragment recruitment plot is a great way to visualize how well a target genome is covered in a metagenome(s) and how close to the species level the hits were.

I share on this blog plot a python function that plots the fragment recruitments of your aligned reads to the target reference genome.

Dependencies

The function below has some dependencies around seaborn and matplotlib, so please make sure you install them. This can be easily installed using pip. When installing seaborn, the matplotlib is automatically handled. Please see here how to install it.

Target Genome

Here, we are using the “Guinea pig Chlamydia phage” as a reference to generate the recruitment plot against a metagenomics dataset that contains the reads for the target.

Generating Alignments

Next, please skip this section if you know how to run BLAST or DIAMOND to generate your alignment files in tabular format.

Format BLASTn Database

Here, we align the reads using BLASTn, so we use makeblastdb to format the reference genome.

$ makeblastdb -in {GENOME.FASTA} -dbtype nucl

Running Alignment

Below there is how you can run blastn to align the target genome to the reference database. Below there is an explanation for the choice of the parameters:

  • -db {GENOME.FASTA}: Database name. As we did not specify a name for it when running makeblastdb, it will have the same name as the query.
  • -query {QUERY.FASTA}: Query FASTA file
  • -out {OUTPUT_FILE}: Path to output file
  • -outfmt 6: Tabular output file
  • -evalue 0.00001: Evalue score
$ blastn -db {GENOME.FASTA} -query {QUERY.FASTA} -out {OUTPUT_FILE} -outfmt 6 -evalue 0.00001 -task blastn -num_threads 10

Generating Recruitment Plot

Now that we have the alignment of the metagenomic reads to the target genome, we can plot the recruitment of the fragments.

The function plot_recruitment_plot should is called to generate the plot, and it takes the following parameters.

  • blast_tabular_output (str): Path to input BLAST like alignment file.
  • output_plot (str): Path to where the image output in PNG will be saved.
  • plot_tittle (str): Plot title (e.g. Genome Name).
  • genome_size (int): Genome size.
  • minimum_identity_displayed (float): Y-axis lower bound (default % identity 70).
  • min_alignment_length (int): Minimum alignment length allowed (default = 100 bp).
  • identity_species_cutoff (float): Species line draw on plot (default = 95).

See code below the function

# !/usr/bin/env python3
# -*- coding: utf-8 -*-

import seaborn

from matplotlib import (pyplot as plt,
                        lines)


def plot_recruitment_plot(blast_tabular_output, output_plot, plot_title, genome_size, minimum_identity_displayed=70,
                          min_alignment_length=100, identity_species_cutoff=95):
    """Parse BLAST like tabular output and create a recruitment plot for the alignments.

    Args:
        blast_tabular_output (str): Path to input BLAST like alignment file.
        output_plot (str): Path to where the image output in PNG will be saved.
        plot_title (str): Plot tittle (e.g. Genome Name).
        genome_size (int): Genome size.
        minimum_identity_displayed (float): Y-axis lower bound (default % identity 70).
        min_alignment_length (int): Minimum alignment length allowed (default = 100 bp).
        identity_species_cutoff (float): Species line draw on plot (default = 95).

    """
    seaborn.set(color_codes=True)
    fig = plt.figure()
    ax = plt.subplot(111)

    ax.set_xlim(0, genome_size + 1)
    ax.set_ylim(minimum_identity_displayed, 100)

    with open(blast_tabular_output) as alignments:
        for alignment in alignments:
            alignment = alignment.split("\t")

            alignment_identity = float(alignment[2])
            alignment_length = int(alignment[3])
            alignment_start = int(alignment[8])
            alignment_stop = int(alignment[9])

            if alignment_length >= min_alignment_length:
                alignment_start_stop = [alignment_start, alignment_stop]

                # need to sort because of alignment direction
                alignment_start_stop.sort()

                real_alignment_start, real_alignment_stop = alignment_start_stop

                # draw line on plot
                ax.add_line(lines.Line2D([real_alignment_start, real_alignment_stop], [alignment_identity],
                                         color="#89896E"))

    ax.add_line(lines.Line2D([0, genome_size + 1], [identity_species_cutoff], color="k"))

    plt.ylabel('% Identity')
    plt.xlabel('Genome Position (bp)')
    plt.title(plot_title)
    fig.savefig(output_plot, format="png", dpi=400)

    plt.close()

Finally, we can generate the plot by calling the function.

>>> import plot_recruitment_plot
>>> plot_recruitment_plot("blast_alignments_phage", "phage_rc.png", "Guinea pig Chlamydia phage, complete genome", 4529)

And here it is our recruitment plot, and I hope that Creating Recruitment Plot is not a problem from now on.

More Resources

Here are two of my favorite Data Visualization Python Books in case you want to learn more about it.

Conclusion

In summary, this tutorial describes how a recruitment plot can be scripted in Python. Moreover, this can be simply expanded to plot different samples against a genome with different colors and other flavors that may be needed.

Related Posts