首页 > 其他分享 >Understanding q-value and FDR in Differential Expression Analysis

Understanding q-value and FDR in Differential Expression Analysis

时间:2024-01-08 09:45:27浏览次数:38  
标签:genes ## Differential qobj value Understanding FDR Analysis hedenfalk

 

Understanding q-value and FDR in Differential Expression Analysis

Daqian

Introduction to q-value and FDR

In differential gene expression analysis, researchers are often confronted with the challenge of distinguishing true signals — those genes that are genuinely differentially expressed, from the noise — genes that appear to be differentially expressed due to random chance. The p-value is a commonly used statistic to address this issue, but when dealing with thousands of genes, the false discovery rate (FDR) becomes critically important. The q-value is an adjusted p-value that controls for the FDR, and it’s particularly useful in large-scale testing scenarios, such as genomic studies.

The hedenfalk dataset

The hedenfalk dataset includes results from an analysis of gene expression related to BRCA1 and BRCA2 mutations in breast cancer patients. It contains p-values, test statistics, and empirical null statistics for 3170 genes.

Let’s load the dataset and inspect its structure:

library(qvalue)
## Warning: package 'qvalue' was built under R version 4.3.1
data(hedenfalk)
class(hedenfalk)
## [1] "list"
names(hedenfalk)
## [1] "p"     "stat"  "stat0"

We can extract the p-values and observe the statistics as follows:

pvalues = hedenfalk$p 
obs_stats = hedenfalk$stat
null_stats = hedenfalk$stat0

length(obs_stats)
## [1] 3170
length(null_stats)
## [1] 317000

Calculating q-values

Using the qvalue package, we can calculate q-values for each p-value:

qobj = qvalue(p = pvalues)

class(qobj)
## [1] "qvalue"
names(qobj)
## [1] "call"       "pi0"        "qvalues"    "pvalues"    "lfdr"      
## [6] "pi0.lambda" "lambda"     "pi0.smooth"
qvalues = qobj$qvalues
pi0 = qobj$pi0
lfdr = qobj$lfdr

A quick summary and visualization can be provided by:

summary(qobj)
## 
## Call:
## qvalue(p = pvalues)
## 
## pi0: 0.669926    
## 
## Cumulative number of significant calls:
## 
##           <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1   <1
## p-value       15     76   265    424   605  868 3170
## q-value        0      0     1     73   162  319 3170
## local FDR      0      0     3     30    85  167 2241
hist(qobj)

plot(qobj)

Interpreting q-values and FDR

When we execute the command max(qvalues[qobj$pvalues <= 0.01]), we find the highest q-value among all p-values less than or equal to 0.01. Suppose this q-value is 0.07932; it implies that we estimate about 7.932% of the genes identified as differentially expressed (p-value <= 0.01) are false positives. In other words, we have an FDR of 0.07932.

Conversely, running max(pvalues[qobj$qvalues <= 0.01]) gives us the highest p-value that corresponds to a controlled q-value of less than or equal to 0.01. Let’s say this p-value is 3.15e-06. This means, for all genes with a p-value <= 3.15e-06, their FDR does not exceed 1%.

Using a specific FDR level

When we run the qvalue function with an fdr.level = 0.01, we get:

qobj_fdrlevel = qvalue(p = hedenfalk$p, fdr.level = 0.01)
head(qobj_fdrlevel$significant); length(qobj_fdrlevel$significant)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
## [1] 3170

This command also yields a logical vector significant, which indicates which genes meet the FDR threshold of 0.01. To get the indices of these genes, we can use:

significant_genes = which(qobj_fdrlevel$significant)
head(significant_genes)
## [1] 1413

Conclusion

The q-value is a powerful tool for researchers performing large-scale hypothesis testing, as it provides a means to control the false discovery rate. By utilizing the qvalue package in R, we can confidently identify genes that are truly differentially expressed, while minimizing the rate of false positives. This is crucial in fields like genomics, where the sheer number of simultaneous tests can lead to a high number of false discoveries if not properly controlled.

 

 

标签:genes,##,Differential,qobj,value,Understanding,FDR,Analysis,hedenfalk
From: https://www.cnblogs.com/res-daqian-lu/p/17951706

相关文章

  • Understanding ELF, the Executable and Linkable Format
    address:https://www.opensourceforu.com/2020/02/understanding-elf-the-executable-and-linkable-format/Wheneverwecompileanycode,theoutputthatwegetisanexecutablefile,whichwegenerallydon’tbotherabout.Weexecuteitonourdesiredtarget.If......
  • 【五期李伟平】CCF-A(S&P'20)The Value of Collaboration in Convex Machine Learning w
    NanW.,etal.“TheValueofCollaborationinConvexMachineLearningwithDifferentialPrivacy.”2020IEEESymposiumonSecurityandPrivacy.304-317.  联邦学习场景中,在适应度函数平滑、强凸、利普斯特连续的条件下,估算各客户端使用不同隐私预算时最终全局模......
  • Argo Rollouts AnalysisTemplate CRD
    AnalysisTemplateCRDapiVersion:argoproj.io/v1alpha1kind:AnalysisTemplatemetadata:name:success-ratespec:args:#模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;-name:<string>value:<string>valueF......
  • SR Algorithm Analysis(1)——ZSSR
    SRAlgorithmAnalysis(1)——ZSSRCVPR2017《“Zero-Shot”Super-ResolutionusingDeepInternalLearning》目录SRAlgorithmAnalysis(1)——ZSSRInnovations:Background:ThePowerofInternalImageStatisticswhy?Methods:Image-SpecificCNNSPHowtobuildtheI↓s?Augm......
  • 神经网络优化篇:如何理解 dropout(Understanding Dropout)
    理解dropoutDropout可以随机删除网络中的神经单元,为什么可以通过正则化发挥如此大的作用呢?直观上理解:不要依赖于任何一个特征,因为该单元的输入可能随时被清除,因此该单元通过这种方式传播下去,并为单元的四个输入增加一点权重,通过传播所有权重,dropout将产生收缩权重的平方范数的......
  • GPT-1论文《Improving Language Understanding by Generative Pre-Training》解读
    背景GPT-1采用了两阶段训练的方式:1. 第一阶段pre-training,在海量文本上训练,无需label,根据前k-1个词预测第k个单词是什么,第一阶段的训练让模型拥有了很多的先验知识,模型具有非常强的泛化性2.第二阶段在特定任务上fine-tuning,让模型能适应不同的任务,提高模型在特定任务上的准......
  • PCA(Principal Components Analysis)主成分分析: 一维列向量坐标的变换是左乘变换矩阵
    总结:一维列向量的坐标变换是左乘变换矩阵;一维行向量的坐标系基元变换是右乘变换矩阵;坐标变换坐标变换定义:把一个向量(或一个点)从一个高维(或3D)坐标系,转换到另一个高维(或3D)坐标系去。举个栗子:东北天坐标系上的点A坐标为(1,2,3),通过坐标变换到北西天坐标系,点A......
  • Argo Rollouts AnalysisTemplate CRD
    AnalysisTemplateCRDapiVersion:argoproj.io/v1alpha1kind:AnalysisTemplatemetadata:name:success-ratespec:args:#模板参数,模板内部引用的格式为“{{args.NAME}}”;可在调用该模板时对其赋值;-name:<string>value:<string>valu......
  • Overview of Machine Learning Methods for Genome-Wide Association Analysis
    OverviewofMachineLearningMethodsforGenome-WideAssociationAnalysisBIBE2021:TheFifthInternationalConferenceonBiologicalInformationandBiomedicalEngineeringOverviewofMachineLearningMethodsforGenome-WideAssociationAnalysisAutho......
  • Detecting Unknown Encrypted Malicious Traffic in Real Time via Flow Interaction
    1前言1.1标题DetectingUnknownEncryptedMaliciousTrafficinRealTimeviaFlowInteractionGraphAnalysis1.2摘要为了保护网络的机密性和隐私性,目前互联网上的流量被广泛地加密。然而,流量加密技术经常被攻击者滥用,以掩盖其恶意行为。由于加密的恶意流量具有与良性......