Effortless Recruitment Plot

by:

BioinformaticsData visualization

Creating a recruitment plot can be a pain when you don’t have the right tool. A fragment recruitment plot is a great way to visualize how well a target genome is covered in a metagenome(s) and how close to the species level the hits were.

I share on this blog plot a python function that plots the fragment recruitments of your aligned reads to the target reference genome.

What is a Genomic Recruitment Plot?

A genomics recruitment plot is a graphical representation used to compare the similarity of two or more genomic sequences. It is a tool used in bioinformatics to visualize the relationship between different genomic sequences, and to identify regions of the genome that are conserved, or have a high degree of similarity, across the sequences being compared.

The recruitment plot typically shows the mapping of one genome (referred to as the “reference genome”) to another genome (referred to as the “query genome”). The reference genome is shown along the x-axis and the query genome is shown along the y-axis. Each point on the plot represents the mapping of a portion of the query genome to the reference genome.

The benefits of using a genomics recruitment plot are:

  1. Visualization of genomic similarities: The recruitment plot provides a visual representation of the similarities and differences between two or more genomic sequences, allowing researchers to quickly identify conserved regions and potential functional elements.
  2. Identification of genomic rearrangements: The recruitment plot can be used to identify genomic rearrangements, such as inversions, deletions, and duplications, that occur between the reference genome and the query genome.
  3. Comparison of multiple genomes: The recruitment plot can be used to compare multiple genomes in a single visualization, allowing researchers to identify common and unique features of different genomes.
  4. Support for multiple data types: The recruitment plot can be used to visualize multiple data types, such as sequencing reads, assemblies, or annotations, making it a versatile tool for genomic analysis.

Overall, the genomics recruitment plot is a useful tool for comparing the similarities and differences between genomic sequences and for visualizing the relationships between different genomes. It provides a quick and intuitive way to identify conserved regions and potential functional elements, and to compare multiple genomes in a single visualization.

Dependencies

The function below has some dependencies around seaborn and matplotlib, so please make sure you install them. This can be easily installed using pip. When installing seaborn, the matplotlib is automatically handled. Please see here how to install it.

Target Genome

Here, we are using the “Guinea pig Chlamydia phage” as a reference to generate the recruitment plot against a metagenomics dataset that contains the reads for the target.

Generating Alignments

Next, please skip this section if you know how to run BLAST or DIAMOND to generate your alignment files in tabular format.

Format BLASTn Database

Here, we align the reads using BLASTn, so we use makeblastdb to format the reference genome.

$ makeblastdb -in {GENOME.FASTA} -dbtype nucl

Running Alignment

Below there is how you can run blastn to align the target genome to the reference database. Below there is an explanation for the choice of the parameters:

  • -db {GENOME.FASTA}: Database name. As we did not specify a name for it when running makeblastdb, it will have the same name as the query.
  • -query {QUERY.FASTA}: Query FASTA file
  • -out {OUTPUT_FILE}: Path to output file
  • -outfmt 6: Tabular output file
  • -evalue 0.00001: Evalue score
$ blastn -db {GENOME.FASTA} -query {QUERY.FASTA} -out {OUTPUT_FILE} -outfmt 6 -evalue 0.00001 -task blastn -num_threads 10

Generating Recruitment Plot

Now that we have the alignment of the metagenomic reads to the target genome, we can plot the recruitment of the fragments.

The function plot_recruitment_plot should is called to generate the plot, and it takes the following parameters.

  • blast_tabular_output (str): Path to input BLAST like alignment file.
  • output_plot (str): Path to where the image output in PNG will be saved.
  • plot_tittle (str): Plot title (e.g. Genome Name).
  • genome_size (int): Genome size.
  • minimum_identity_displayed (float): Y-axis lower bound (default % identity 70).
  • min_alignment_length (int): Minimum alignment length allowed (default = 100 bp).
  • identity_species_cutoff (float): Species line draw on plot (default = 95).

See code below the function

# !/usr/bin/env python3
# -*- coding: utf-8 -*-

import seaborn

from matplotlib import (pyplot as plt,
                        lines)


def plot_recruitment_plot(blast_tabular_output, output_plot, plot_title, genome_size, minimum_identity_displayed=70,
                          min_alignment_length=100, identity_species_cutoff=95):
    """Parse BLAST like tabular output and create a recruitment plot for the alignments.

    Args:
        blast_tabular_output (str): Path to input BLAST like alignment file.
        output_plot (str): Path to where the image output in PNG will be saved.
        plot_title (str): Plot tittle (e.g. Genome Name).
        genome_size (int): Genome size.
        minimum_identity_displayed (float): Y-axis lower bound (default % identity 70).
        min_alignment_length (int): Minimum alignment length allowed (default = 100 bp).
        identity_species_cutoff (float): Species line draw on plot (default = 95).

    """
    seaborn.set(color_codes=True)
    fig = plt.figure()
    ax = plt.subplot(111)

    ax.set_xlim(0, genome_size + 1)
    ax.set_ylim(minimum_identity_displayed, 100)

    with open(blast_tabular_output) as alignments:
        for alignment in alignments:
            alignment = alignment.split("\t")

            alignment_identity = float(alignment[2])
            alignment_length = int(alignment[3])
            alignment_start = int(alignment[8])
            alignment_stop = int(alignment[9])

            if alignment_length >= min_alignment_length:
                alignment_start_stop = [alignment_start, alignment_stop]

                # need to sort because of alignment direction
                alignment_start_stop.sort()

                real_alignment_start, real_alignment_stop = alignment_start_stop

                # draw line on plot
                ax.add_line(lines.Line2D([real_alignment_start, real_alignment_stop], [alignment_identity],
                                         color="#89896E"))

    ax.add_line(lines.Line2D([0, genome_size + 1], [identity_species_cutoff], color="k"))

    plt.ylabel('% Identity')
    plt.xlabel('Genome Position (bp)')
    plt.title(plot_title)
    fig.savefig(output_plot, format="png", dpi=400)

    plt.close()

Finally, we can generate the plot by calling the function.

>>> import plot_recruitment_plot
>>> plot_recruitment_plot("blast_alignments_phage", "phage_rc.png", "Guinea pig Chlamydia phage, complete genome", 4529)

And here it is our recruitment plot, and I hope that Creating Recruitment Plot is not a problem from now on.

More Resources

Here are two of my favorite Data Visualization Python Books in case you want to learn more about it.

Conclusion

In summary, this tutorial describes how a recruitment plot can be scripted in Python. Moreover, this can be simply expanded to plot different samples against a genome with different colors and other flavors that may be needed.

Related Posts