You don't need to be signed in to read BMJ Blogs, but you can register here to receive updates about other BMJ products and services via our site.

SeqHBase: a big data toolset for family-based sequencing data analysis

13 Jan, 15 | by hqqu

High-throughput sequencing technologies are now increasingly used to find disease genes, but it is difficult to infer biological insights from massive amounts of data in a short period of time. We developed a software framework called SeqHBase to help quickly identify disease genes. SeqHBase was developed based on Apache Hadoop and HBase infrastructure, which works through distributed and parallel manner over multiple data nodes. Its input includes coverage information of 3 billion sites, over 3 million variants and their associated functional annotations for each genome. With 20 data nodes, SeqHBase took about 5 seconds for analyzing whole-exome sequencing data for a family quartet and approximately 1 minute for analyzing whole-genome sequencing data for a 10-member family. We demonstrated SeqHBase’s high efficiency and scalability with several real sequencing data sets. (By Min He, Ph.D., http://jmg.bmj.com/content/early/2015/01/13/jmedgenet-2014-102907 )

By submitting your comment you agree to adhere to these terms and conditions
You can follow any responses to this entry through the RSS 2.0 feed.
JMG blog homepage

JMG Contact

Research developments and evidence-based medical genetics. Visit site



Creative Comms logo

Latest from Journal of Medical Genetics

Latest from Journal of Medical Genetics