Statistical challenges in the analysis of single-cell gene expression data


Cell-to-cell gene expression variability in seemingly homogeneous cell populations plays a crucial role in tissue function and development. Single-cell RNA sequencing (scRNAseq) can characterise this variability in a genome-wide manner. However, the promise of scRNA-seq comes at the cost of higher data complexity. In particular, a prominent feature of scRNA-seq experiments is strong measurement error, reflected by technical dropouts and poor correlations between technical replicates. These effects must be taken into account to reveal biological findings that are not confounded by technical variation.

In this talk, I will describe some of the statistical challenges that arise in scRNA-seq experiments: from experimental design to downstream inference. I will also introduce BASiCS (Bayesian Analysis of Single Cell Sequencing data), a Bayesian hierarchical model in which data normalization, technical noise quantification and downstream analyses are simultaneously performed. I will describe how BASiCS can robustly quantify cell-to-cell expression variability and to perform differential variability analyses between cell populations (e.g. experimental conditions or cell types). I will then illustrate the use of our methods in the context of immune cells. Finally, I will discuss ongoing efforts to improve the scalability of our approach.

This is join work with Nils Eling, Alan O’Callaghan, Arianne Richard, Sylvia Richardson and John Marioni.