This short tutorial demonstrates how you can use Python to compute N50 for Genome Assembly. What is N50? How to calculate N50 Is a larger n50 better? These questions will be answered here.
1. What is N50?
In genomics, N50 is a metric that measures the quality of assembled genomes (contigs or scaffolds).
2. How to calculate N50
Below is the simple Python Script to compute the N50 for a list with contigs lengths.
#!/usr/bin/env python3 # -*- coding: utf-8 -*- def calculate_N50(list_of_lengths): """Calculate N50 for a sequence of numbers. Args: list_of_lengths (list): List of numbers. Returns: float: N50 value. """ tmp =  for tmp_number in set(list_of_lengths): tmp += [tmp_number] * list_of_lengths.count(tmp_number) * tmp_number tmp.sort() if (len(tmp) % 2) == 0: median = (tmp[int(len(tmp) / 2) - 1] + tmp[int(len(tmp) / 2)]) / 2 else: median = tmp[int(len(tmp) / 2)] return median
In Scaffold_builder, a tool that I published in graduate school, N50 was used as one of the metrics to compare the assembly genome contigs vs the scaffolded contigs.
3. Is a larger n50 better?
Here are three of my favorite Python Bioinformatics Books in case you want to learn more about it.
- Python for the Life Sciences: A Gentle Introduction to Python for Life Scientists Paperback by Alexander Lancaster
- Bioinformatics with Python Cookbook by Tiago Antao
- Bioinformatics Programming Using Python: Practical Programming for Biological Data by Mitchell L. Model
In summary, this tutorial you learn more about N50 and how to use Python to compute it.