imageIn this illustration of haplotype assembly, published in Dr. Bansal’s 2008 Bioinformatics paper, each read originates from one of the two chromosomes.

With the availability of paired-end sequence data, it is now feasible to phase genotypes using information from sequence data. As read lengths of next-generation sequencing methods increase and cost of sequencing reduces, phasing from sequence data is likely to become the method of choice for generating phased “diploid genomes” spanning both common and rare variants.

HapCUT, a max-cut based algorithm for haplotype assembly that uses the mix of sequenced fragments from the two chromosomes of an individual, was developed at UC San Diego by Vikas Bansal, Ph.D., for phasing Craig Venter, Ph.D.’s genome, which was deciphered by Sanger sequencing technology.

Because of recent improvements, this program can be applied to sequence data generated from next-generation sequencing platforms. HapCUT takes as input the aligned SAM/BAM files for an individual diploid genome and the list of variants, and outputs the phased haplotype blocks that can be assembled from the paired-end sequence reads.

For more information:

HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bansal V, Bafna V. Bioinformaticss. 24(16):i153-9. 2008 Aug 15. PMID: 18689818.