Would you believe that you could estimate the functional profile for your metagenomic dataset 1,000 times faster than other tools out there? In this tutorial, I will show how SUPER-FOCUS is much faster than other tools with little loss of sensitivity and show you how to use the tool.
1. What is SUPER-FOCUS?
First and foremost, SUPER-FOCUS is a tool for metagenomics functional analysis, and it uses the SEED database (a subsystem database – subsystem level 1, 2, 3, and function). The tool outputs the profile for all the 4 levels and also a file with the binning for each read in case the user wants to pull the reads and use for another analysis.
Moreover, Robert Edwards gives a great lecture about subsystems and how they can be used for classification. Please watch this. Also, to learn more about the subsystems you can read the SUPER-FOCUS paper.
2. Why is SUPER-FOCUS so fast?
SUPER-FOCUS is fast for three reasons:
- The SEED is database is clustered using CD-HIT by subsystems with different identities levels (90%, 95%, 98%, or 100% identity- 90% is the default)
- SUPER-FOCUS runs FOCUS to predict the taxa on the metagenome and only aligns the query to the subsystems found on the profiled taxa. We show on the paper that it actually helps to reduce some possible noise from non-microbial contamination data such as sequenced plant DNA
- DIAMOND is one of the aligners given as an option for the user which is much faster than BLASTx/BLASTp
Therefore, a really fast aligner and small databases imply a much faster run time when aligning query sequences to the database when compared to other tools that don’t do it.
3. Installing and Running SUPER-FOCUS
Installing SUPER-FOCUS is very easy because it lives in pip. All you need to do is to run pip3 install super-focus.
Running SUPER-FOCUS is also painless. The tool contains a great README file on SUPER-FOCUS’ git, so I will redirect you to it.
4. SUPER-FOCUS performance
Finally, we show on the paper that SUPER-FOCUS speeds up the process of sequence functional annotation 37, 60 and 1000 times faster than RTMg, MEGAN, and MG-RAST, respectively. Also, show that we gain all the speed with little loss of sensitivity (please check the paper – Figure 9).
Here are three of my favorite Python Bioinformatics Books in case you want to learn more about it.
- Python for the Life Sciences: A Gentle Introduction to Python for Life Scientists Paperback by Alexander Lancaster
- Bioinformatics with Python Cookbook by Tiago Antao
- Bioinformatics Programming Using Python: Practical Programming for Biological Data by Mitchell L. Model
In summary, I was able to show here that using SUPER-FOCUS is possible to estimate the functional profile a metagenomic dataset 1000 times faster than other tools our there with little loss of sensitivity.