This tutorial will teach you how to download NGS data and metadata from repositories such as NCBI SRA, MG-RAST, Imicrobe, etc – very helpful to download sra to fastq. The tutorial will be using grabseqs which can be installed using Bioconda.
Bioinformatics scientists often need to download next-generation data from repositories such as NCBI SRA, MG-RAST, and Imicrobe; However, each of these platforms has its own method to download the data. This tutorial presents grabseqs, a simple tool to download data from these platforms in a single line.
First and foremost, we need to get grabseqs install, and luckily it lives pip and some of its dependencies can be installed with conda. Please use the command line below to get it installed.
# install Python dependencies and grabseqs code $ pip install grabseqs # install fastq-dump dependency $ conda install parallel-fastq-dump # install pigz dependency $ conda install pigz
The tool README states that the tool can be found on bioconda, but I was not able to install it using bioconda. I will update this post in the near future when it works.
Downloading NCBI SRA Data
Here I have downloaded data from SRX8214442 (Metagenomic microbiome analysis in Type I and Type II Diabetes Patients). This should download the sra to fastq format.
# -t = number of threads # -m = file for metadata output # -o = output directory # -r = number of retries $ grabseqs sra -t 16 -m SRX8214442_metadata.csv -o sra_output/ -r 10 SRX8214442
However, you can download data from projects such as bioProjects (PRJNA).
Downloading MG-RAST Data
# -t = number of threads # -m = file for metadata output # -o = output directory # -r = number of retries $ grabseqs mgrast -t 16 -m mg_rast_metadata.csv -o output/ -r 10 mgp5369
In summary, you learned here how to download NGS data and metadata from NCBI SRA and MG-RAST using grabseqs. I hope this tool is useful for you as much as it has been for me.