This tutorial will teach you how to download NGS data and metadata from repositories such as NCBI SRA, MG-RAST, Imicrobe, etc – very helpful to download sra to fastq. The tutorial will be using grabseqs which can be installed using Bioconda.
Bioinformatics scientists often need to download next-generation data from repositories such as NCBI SRA, MG-RAST, and Imicrobe; However, each of these platforms has its own method to download the data. This tutorial presents grabseqs, a simple tool to download data from these platforms in a single line.
Installing grabseqs
First and foremost, we need to get grabseqs install, and luckily it lives pip and some of its dependencies can be installed with conda. Please use the command line below to get it installed.
# install Python dependencies and grabseqs code
$ pip install grabseqs
# install fastq-dump dependency
$ conda install parallel-fastq-dump
# install pigz dependency
$ conda install pigz
The tool README states that the tool can be found on bioconda, but I was not able to install it using bioconda. I will update this post in the near future when it works.
Downloading NCBI SRA Data
Here I have downloaded data from SRX8214442 (Metagenomic microbiome analysis in Type I and Type II Diabetes Patients). This should download the sra to fastq format.
# -t = number of threads
# -m = file for metadata output
# -o = output directory
# -r = number of retries
$ grabseqs sra -t 16 -m SRX8214442_metadata.csv -o sra_output/ -r 10 SRX8214442
However, you can download data from projects such as bioProjects (PRJNA).
Downloading MG-RAST Data
Next, downloading genomic data from MG-RAST is also simple. Here, we will be downloading data from the Bbay metagenomic study which is under the MG-RAST ID mgp5369.
# -t = number of threads
# -m = file for metadata output
# -o = output directory
# -r = number of retries
$ grabseqs mgrast -t 16 -m mg_rast_metadata.csv -o output/ -r 10 mgp5369
Conclusion
In summary, you learned here how to download NGS data and metadata from NCBI SRA and MG-RAST using grabseqs. I hope this tool is useful for you as much as it has been for me.
If you want to learn more about grabseqs, please check out the tool github repository and the paper.