Randomly Subsample Paired FASTQ or FASTA
Using seqtk, we can quickly downsample a paired set of FASTQs. It is important to set the same seed (-s 123) when running FASTQ pairs so the order of the random selection can be repeated between FASTQ.
In the example below, we subsample 100k reads from each FASTQ pair.
# FASTQ R1 $ seqtk sample -s 123 read1.fq 100000 > sub_read1.fq # FASTQ R2 $ seqtk sample -s 123 read2.fq 100000 > sub_read2.fq
The same command lines could had been applied on paired FASTA files. Moreover, it should also work to subsample a FASTQ gz file.
Randomly Subsample FASTQ or FASTA
Similar to the previous section, here we subsample 100k reads from a single pair FASTQ or FASTA.
# single paired FASTA $ seqtk sample sample.fasta 100000 > sub_sample.fasta
- Fast Conversion of Lowercase Sequences to Uppercase in FASTA Format
- Easy NCBI Genome Download
- The Fastest Way to Read a FASTA in Python