Easy NCBI Genome Download



NCBI genome download can be a very unpleasant job. This page shows how to use NCBI-genome-download to download NCBI genomes with a single command line easily.

1. Installing ncbio-genome-download

Installing the tool is simple; I’m glad to inform you that their developers added the device to bioconda and pip.

# install ncbi-genome-download using bioconda
$ conda install -c bioconda ncbi-genome-download

# install ncbi-genome-download using pip
$ pip install ncbi-genome-download

You can use the command to make sure that the tool was indeed installed.

$ which ncbi-genome-download

2. Download RefSeq-NCBI Genomes by Kingdom

Downloading RefSeq-NCBI genomes by kingdom using NCBI-genome-download is simple. The command line below download the genomes in FASTA format and outputs them to the “output_dir” directory. It also downloads the 16 genomes in parallel.

$ ncbi-genome-download bacteria -F fasta -o output_dir/ --parallel 16

More than one kingdom can be passed, separated by a comma: the sample below downloads all the bacterial and fungal genomes on the Refseq database.

$ ncbi-genome-download bacteria,fungi -F fasta -o output_dir/ --parallel 16

If another output format is needed besides FASTA, the tool also provides the following options:

'genbank' (default), 'rm', 'features', 'gff','protein-fasta', 'genpept', 'wgs', 'cds-fasta', 'rna-fna', 'rna-fasta', 'assembly-report', 'assembly-stats', 'all'

3. Download RefSeq-NCBI Genomes by Genus

Downloading RefSeq-NCBI Genomes by genus is single, and all you need to pass is the flag “-genera” and the genus name to the previous command line.

In the example below, it downloads all the lactobacillus RefSeq genomes in FASTA format using 16 threads.

$ ncbi-genome-download bacteria --genera lactobacillus -F fasta -o output_dir/ --parallel 16

4. Download RefSeq-NCBI Genomes by Taxid

You can use the flag “–species-taxids” to download all the genomes related to the species taxid. In the example below, it downloads all the genomes for Lactobacillus iners which is under taxid 147802.

$ ncbi-genome-download bacteria --species-taxids 147802 -F fasta -o output_dir/ --parallel 16

However, if there is a specific genome you want to download based on its taxid, you can use the flag “–taxids.”

Here we download the Lactobacillus iners AB-1 genome under taxid 713605

$ ncbi-genome-download bacteria --taxids 713605 -F fasta -o output_dir/ --parallel 16

5. Conclusion

I hope you appreciate as much as I did how easy NCBI-genome-download makes the process to download RefSeq genomes.

For more information on the tool and other parameters, please check out the tool documentation.

6. More Resources