p.adjust()
functionLet’s start by looking at the help of the p.adjust
function
?p.adjust
voom-limma
analysisNow (re)compute the raw p-values from the empircial Bayes moderated t-test performed by the voom-limma
analysis pipeline on the RNA-seq data on Tuberculosis infection from the Berry cohort in London analyzed in Singhania et al..
library(dplyr)
library(edgeR)
load("SinghaniaTB_data.Rdata")
# removing genes never observed
gene_no0 <- rowSums(dplyr::select(expr_singhania_raw, -dplyr::starts_with("Gene"))) >
0
expr_singhania_raw_no0 <- dplyr::filter(expr_singhania_raw, gene_no0)
# filter only genes with enough cpm
mycpmfun <- function(x) {
return((x/sum(x)) * 10^6)
}
cpm <- apply(dplyr::select(expr_singhania_raw_no0, -starts_with("Gene")), MARGIN = 2,
FUN = mycpmfun)
expr_singhania_raw_no0_cpm2p <- dplyr::filter(expr_singhania_raw_no0, rowSums(cpm >
2) > 5)
# filter only protein coding genes
gene_pc <- expr_singhania_raw_no0_cpm2p$Gene_biotype == "protein_coding"
expr_singhania_raw_no0_cpm2p_prot <- filter(expr_singhania_raw_no0_cpm2p, gene_pc)
# normalize the data
singhaniaDGE <- DGEList(counts = select(expr_singhania_raw_no0_cpm2p_prot, -starts_with("Gene")),
samples = infos_singhania, genes = select(expr_singhania_raw_no0_cpm2p_prot,
starts_with("Gene")))
singhaniaDGE_normfact <- edgeR::calcNormFactors(singhaniaDGE, method = "TMM")
# select only the london samples
singhaniaLondonDGE_normfact <- singhaniaDGE_normfact[, singhaniaDGE_normfact$samples$Location ==
"London"]
# build the desired design matrix
design <- as.data.frame(model.matrix(~Status, data = infos_singhania[infos_singhania$Location ==
"London", ]))
design <- dplyr::mutate(design, ActiveTB = 1 - (StatusLTBI + StatusControl))
design <- dplyr::select(design, -starts_with("Status"))
library(limma)
# compute the precision weights necessary to account for heteroskedasticity
v <- voom(singhaniaLondonDGE_normfact, design, plot = FALSE)
# fit the gaussian linear model with precision weights v
fit <- lmFit(v, design)
# perform an Empirical Bayes modified t-test
fit <- eBayes(fit, robust = TRUE)
Retreive the raw p-values from the output of the limma::eBayes()
function, and sort them in ascending order using the sort()
function. Store them in a data frame alongside their rank.
Using the various multiple testing correction methods from the class with the p.adjust()
function, compare the number of significant genes.
Display the different thresholds on a graph with all the pvalues on log-scale.
Comment on the differences. Which correction would you chose and why ?