转录组分析的几个问题_effective_length_logCPM_akka9981

http://blog.sina.com.cn/u/1890760194

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

转录组分析的几个问题_effective_length_logCPM

(2013-04-11 18:05:58)

标签：

trinity

杂谈

分类：生信

 1. effective length

The effectively length is calculated by RSEM. I was looking over the
paper this morning and it states: "The effective length can be thought
of as the mean number of positions from which a fragment may start
within the sequence of transcript i" when computing transcript
fractions



The effective length is the transcript length minus the average
rna-seq fragment length (excluding adapters), and represents the
length of the transcript from which fragments could have been derived
from.   This is described in a few papers - perhaps cufflinks and RSEM
papers are best to explore for more details.   Because the FPKM value
is so closely tied to the effective fragment length, different
settings for this value can have large consequences in terms of the
observed fold variation in expression between different transcripts,
and will be more exaggerated in the case of smaller transcripts than
larger transcripts.





If you want to minimize the effect of this 'effective length'
calculation on your FPKM values, set the fragment length value to (1)
- in this case, the transcript length in the denominator of the FPKM
calculation will be equal to the length of the target transcript, as
used in the earliest papers involving RPKM calculations.

Note that, in the way we are analyzing the data for differential
expression, only the raw counts (ascertained by RSEM) are used for
statistical analysis (edgeR, DESeq).  The FPKM measures are only used
in generating heatmaps and indicating intensity values.  The
consequence of getting the FPKM calculation wrong is that you'll not
be able to accurately compare levels of expression of different genes
to each other, but you'd still be able to compare the expression of
individual transcripts among different conditions. 



By err'ing on
longer effective lengths, you'll reduce the magnitude of these
effects.

I certainly defer to the experts here (Pachter, Trapnell, Roberts,
Dewey, Li, etc.) on the best route to take.  I'm not particularly
pleased with our current simple method for doing it. eXpress, like
cufflinks, provides FPKM calculations that should be accurate, and you
might explore it. We're still evaluating it.



2. logCPM

the average log2-counts-per-million

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：计算variable importance的R程序

后一篇：[zz]UniGene

新浪BLOG意见反馈留言板　欢迎批评指正