Gene Set Analysis for time-to-event outcome with the Generalized Berk–Jones statistic


Gene Set analysis allows to evaluate the impact of groups of genes on an outcome of interest, such as the occurrence of a disease. Through the definition of the gene sets, gene set analysis takes into account biological knowledge and makes it easier to interpret the results, while improving the statistical power compared to a gene-wise analysis. In the time-to-event context, few methods exist, but most of them do not take into account the correlation that occurs inside a gene set, which can be strong. As the Generalized Berk-Jones statistics showed great consistency and includes the correlation inside the test statistic, we adapted this method to the time-to-event context by using a Cox model. We compared our approach to other methods based on the Cox model, and showed that the Generalize Berk-Jones statistic offers great adaptability, meaning that it can be used in all kinds of data structures. We applied the different methods to two different contexts: Gliomas and Breast cancer. In terms of statistical power, we did offer similar results to the other Cox model methods, but with greater accuracy. In the breast cancer framework, we showed better statistical power than methods based on Kernel Machine score.

bioRxiv 2021.09.07.459329