Event

QLS Featured Seminar - David Rocke

Thursday, September 28, 2017 12:00to13:00
McIntyre Medical Building room 908, 3655 promenade Sir William Osler, Montreal, QC, H3G 1Y6, CA

Excess False Positives in Negative-Binomial Based Analysis of Data from RNA-Seq Experiments

David M. Rocke1,2, PhD and Yilun Zhang, MS1

1Division of Biostatistics, Department of Public Health Sciences, UC Davis
 2Department of Biomedical Engineering, UC Davis

Key Words: RNA-Seq, Gene Expression, Negative Binomial, DESeq, edgeR, limma-voom

 

RNA-Seq data are increasingly used for whole-genome differential mRNA expression analysis in lieu of gene expression arrays such as those from Affymetrix and Illumina. Because the raw data in RNA-Seq consist of counts of fragments mapping to each gene or exon, and because the counts are over-dispersed, it is common to model the distribution as negative binomial. Yet empirically methods based on the negative binomial generate often massively inflated false positives whether real data are used or simulated negative binomial data. This appears to be a consequence of the fact that the negative binomial with unknown scale is not an exponential family distribution, and that as a quasi-likelihood, the link function, and thus the natural parameter, are functions of the scale parameter. Consequently also, a linear model with negative binomial quasi-likelihood is not a proper generalized linear model unless the scale is known. We demonstrate that, even when the data are truly negative binomial, it is better to use transformation or weighting followed by standard linear models than it is to fit a version of a generalized linear model with estimated scale.

Back to top