NCBI genome download can be a very unpleasant job. This page shows how to use NCBI-genome-download to download NCBI genomes with a single command line easily.
1. Installing ncbio-genome-download
# install ncbi-genome-download using bioconda $ conda install -c bioconda ncbi-genome-download # install ncbi-genome-download using pip $ pip install ncbi-genome-download
You can use the command to make sure that the tool was indeed installed.
$ which ncbi-genome-download
2. Download RefSeq-NCBI Genomes by Kingdom
Downloading RefSeq-NCBI genomes by kingdom using NCBI-genome-download is simple. The command line below download the genomes in FASTA format and outputs them to the “output_dir” directory. It also downloads the 16 genomes in parallel.
$ ncbi-genome-download bacteria -F fasta -o output_dir/ --parallel 16
More than one kingdom can be passed, separated by a comma: the sample below downloads all the bacterial and fungal genomes on the Refseq database.
$ ncbi-genome-download bacteria,fungi -F fasta -o output_dir/ --parallel 16
If another output format is needed besides FASTA, the tool also provides the following options:
'genbank' (default), 'rm', 'features', 'gff','protein-fasta', 'genpept', 'wgs', 'cds-fasta', 'rna-fna', 'rna-fasta', 'assembly-report', 'assembly-stats', 'all'
3. Download RefSeq-NCBI Genomes by Genus
Downloading RefSeq-NCBI Genomes by genus is single, and all you need to pass is the flag “-genera” and the genus name to the previous command line.
In the example below, it downloads all the lactobacillus RefSeq genomes in FASTA format using 16 threads.
$ ncbi-genome-download bacteria --genera lactobacillus -F fasta -o output_dir/ --parallel 16
4. Download RefSeq-NCBI Genomes by Taxid
You can use the flag “–species-taxids” to download all the genomes related to the species taxid. In the example below, it downloads all the genomes for Lactobacillus iners which is under taxid 147802.
$ ncbi-genome-download bacteria --species-taxids 147802 -F fasta -o output_dir/ --parallel 16
However, if there is a specific genome you want to download based on its taxid, you can use the flag “–taxids.”
Here we download the Lactobacillus iners AB-1 genome under taxid 713605
$ ncbi-genome-download bacteria --taxids 713605 -F fasta -o output_dir/ --parallel 16
I hope you appreciate as much as I did how easy NCBI-genome-download makes the process to download RefSeq genomes.
For more information on the tool and other parameters, please check out the tool documentation.
6. More Resources
- The Easiest Way to Download Genomic Data from NCBI SRA, MG-RAST, etc
- Multiple Sequence Alignment – Theory and Practice – Step-by-Step
- How to Simulate NGS reads – Step-by-Step