Easy NCBI Genome Download

onestop_databy:

Bioinformatics

NCBI genome download can be a very unpleasant job. This page shows how to use ncbi-genome-download to easily download NCBI genomes with a single command line.

Installing ncbio-genome-download

Installing the tool is simple; I’m glad to inform you that their developers added the tool to bioconda and pip.

# install ncbi-genome-download using bioconda
$ conda install -c bioconda ncbi-genome-download

# install ncbi-genome-download using pip
$ pip install ncbi-genome-download

You can use the command which to make sure that the tool was indeed installed

$ which ncbi-genome-download

Download RefSeq-NCBI Genomes by Kingdom

Downloading RefSeq-NCBI genomes by kingdom using ncbi-genome-download is simple. The command line below download the genomes in FASTA format and outputs them to the “output_dir” directory. It also downloads the 16 genomes in parallel.

$ ncbi-genome-download bacteria -F fasta -o output_dir/ --parallel 16

More than one kingdom can be passed separated by a comma. On the sample below, it downloads all the bacterial and fungal genomes on the Refseq database.

$ ncbi-genome-download bacteria,fungi -F fasta -o output_dir/ --parallel 16

If other output format is needed besides FASTA, the tool also provides the following options:

'genbank' (default), 'rm', 'features', 'gff','protein-fasta', 'genpept', 'wgs', 'cds-fasta', 'rna-fna', 'rna-fasta', 'assembly-report', 'assembly-stats', 'all'

Download RefSeq-NCBI Genomes by Genus

Downloading RefSeq-NCBI Genomes by genus is single, and all you need to pass is the flag “-genera” and the genus name to the previous command line.

In the example below, it downloads all the lactobacillus RefSeq genomes in FASTA format using 16 threads.

$ ncbi-genome-download bacteria --genera lactobacillus -F fasta -o output_dir/ --parallel 16

Download RefSeq-NCBI Genomes by Taxid

You can use the flag “–species-taxids” to download all the genomes related to the species taxid. In the example below, it downloads all the genomes for Lactobacillus iners which is under taxid 147802.

$ ncbi-genome-download bacteria --species-taxids 147802 -F fasta -o output_dir/ --parallel 16

However, if there is a specific genome you want to download based on its taxid, you can use the flag “–taxids”.

Here we download the Lactobacillus iners AB-1 genome under taxid 713605

$ ncbi-genome-download bacteria --taxids 713605 -F fasta -o output_dir/ --parallel 16

Conclusion

I hope you appreciate as much as I did how easy ncbi-genome-download makes the process to download RefSeq genomes.

For more information on the tool and other parameters, please check out the tool documentation.

More Resources