Big Binary Indexed (BBI) file Parsing with Python

by:

Bioinformatics

This tutorial reviews some important information on Big Binary Indexed (BBI) files and how to parse the using Python.

1. What is the Big Binary Indexed (BBI) format?

A Big Binary Indexed (BBI) file is a data structure that is used to store large collections of genomic data in a compact and efficient manner. The BBI file format is commonly used for the representation of data generated by next-generation sequencing (NGS) technologies, such as RNA-seq, ChIP-seq, and whole-genome sequencing.

The BBI file format is based on the binary indexed tree (BIT) data structure, which allows for fast and efficient retrieval of genomic information, such as the coverage of reads at a particular locus, the number of reads that overlap a genomic interval, and the expression levels of transcripts. The BBI file format also supports the creation of multiple data tracks in a single file, which can be used to store different types of information, such as read coverage, transcript expression levels, and epigenetic marks.

In summary, the BBI file format is a useful tool for the storage, retrieval, and analysis of genomic data. It allows for efficient data compression and retrieval, making it a popular choice for the representation of large-scale genomic datasets.

2. Parsing a Big Binary Indexed (BBI) file

To parse a Big Binary Indexed (BBI) file, you will need to use a software library or tool that is capable of reading and processing BBI files. Some popular libraries and tools that can be used to parse BBI files include:

  1. libBigWig: This is an open-source C library that provides an API for reading and writing BigWig files, which are a specific type of BBI file that stores continuous-valued data over genomic intervals. libBigWig can be used in C and C++ programs to parse BigWig files and extract data such as read coverage, expression levels, and epigenetic marks.
  2. UCSC Genome Browser: The UCSC Genome Browser is a popular web-based tool for the visualization and analysis of genomic data. It supports the display of BBI files, and you can use it to extract data from a BBI file and download it for further analysis.
  3. bedtools: bedtools is a suite of command-line tools for the analysis of genomic data. It includes the bamToBed and bigWigToBed tools, which can be used to convert BAM and BigWig files to the BED format, respectively. Once the data is in the BED format, you can use other bedtools commands to perform further analysis and extract data from the BBI file.

In general, the specific steps involved in parsing a BBI file will depend on the library or tool you are using. However, most tools will require you to specify the BBI file you want to parse and the data you want to extract. You may also need to provide additional information, such as the genomic interval of interest or the data track you want to extract.

3. Parsing Big Binary Indexed (BBI) file with Python

When it comes to parsing BBI files, the pybbi package is the best; you can install it with conda or pip. The example below shows how to read a BBI file and give a list of chromosome selections, to query data given the chromosome start and end positions.

# load library
import bbi


# read file
BBIFile = bbi.open(input_bbi)

# read chrom sizes
chrom_sizes = BBIFile.chromsizes

# target chroms
for chrom in ["chr1", "chr2", "chr3"]:
    #                                 chrom  start position, end position
    for row in BBIFile.fetch_intervals(chrom, 0, chrom_sizes[chrom], iterator=True):
        print(row)

4. Conclusion

In conclusion, this tutorial has provided a comprehensive overview of Big Binary Indexed (BBI) files and demonstrated how to parse them using Python. By explaining the structure and purpose of BBI files and introducing the necessary Python libraries for parsing, users can understand how to effectively work with this file format. Additionally, the tutorial provides step-by-step instructions on how to read BBI files, retrieve specific data types, and visualize the data using popular Python libraries such as matplotlib. This knowledge is especially useful for bioinformaticians and researchers working with large genomic datasets that use BBI files. By utilizing Python to parse BBI files, users can streamline their analysis workflows and gain new insights into their data. Overall, this tutorial is a valuable resource for any researcher or data analyst looking to work with BBI files and integrate them into their projects.