Xiao Su, Ph.D. Candidate
Division of Biostatistics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Recently, we published a paper that identifies genes with bimodal expression using next generation RNAseq data [1]. Identifying bimodally expressed genes is very important since such genes can be used to classify diseases and thus are good candidates of biomarkers [2-3]. The authors applied three mixture models (Negative Binomial, Generalized Poisson and Lognormal) to model RNAseq data and found the Lognormal works well in both real and simulated data. The application to The Cancer Genome Atlas (TCGA) [4] breast cancer RNAseq data not only correctly replicated well-known bimodal genes including HER2, ER and PR, but also discovered many novel bimodal genes. This method may be further developed using nonparametric mixture models since the data might not be well characterized by any of the above three distributions. Also, it is possible that the data may actually be a mixture of more than two distributions. A good way to implement the extended models could be through Monte Carlo Markov Chain (MCMC), which is being addressed by my ongoing research.
Reference
[1] Tong, Pan, Yong Chen, Xiao Su, and Kevin R. Coombes. “SIBER: systematic identification of bimodally expressed genes using RNAseq data.” Bioinformatics29, no. 5 (2013): 605-613.
[2] Teschendorff, Andrew E., Ahmad Miremadi, Sarah E. Pinder, Ian O. Ellis, and Carlos Caldas. “An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer.” Genome Biol8, no. 8 (2007): R157.
[3] Teschendorff, Andrew E., Ahmad Miremadi, Sarah E. Pinder, Ian O. Ellis, and Carlos Caldas. “An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer.” Genome Biol8, no. 8 (2007): R157.
[4] McLendon, Roger, Allan Friedman, Darrell Bigner, Erwin G. Van Meir, Daniel J. Brat, Gena M. Mastrogianakis, Jeffrey J. Olson et al. “Comprehensive genomic characterization defines human glioblastoma genes and core pathways.” Nature455, no. 7216 (2008): 1061-1068.