转录组分析的几个问题_effective_length_logCPM
(2013-04-11 18:05:58)
标签:
trinity杂谈 |
分类: 生信 |
1. effective length
The effectively length is calculated by RSEM. I was looking over the paper this morning and it states: "The effective length can be thought of as the mean number of positions from which a fragment may start within the sequence of transcript i" when computing transcript fractions
The effective length is the transcript length minus the average rna-seq fragment length (excluding adapters), and represents the length of the transcript from which fragments could have been derived from. This is described in a few papers - perhaps cufflinks and RSEM papers are best to explore for more details. Because the FPKM value is so closely tied to the effective fragment length, different settings for this value can have large consequences in terms of the observed fold variation in expression between different transcripts, and will be more exaggerated in the case of smaller transcripts than larger transcripts.
If you want to minimize the effect of this 'effective length' calculation on your FPKM values, set the fragment length value to (1) - in this case, the transcript length in the denominator of the FPKM calculation will be equal to the length of the target transcript, as used in the earliest papers involving RPKM calculations. Note that, in the way we are analyzing the data for differential expression, only the raw counts (ascertained by RSEM) are used for statistical analysis (edgeR, DESeq). The FPKM measures are only used in generating heatmaps and indicating intensity values. The consequence of getting the FPKM calculation wrong is that you'll not be able to accurately compare levels of expression of different genes to each other, but you'd still be able to compare the expression of individual transcripts among different conditions.
By err'ing on longer effective lengths, you'll reduce the magnitude of these effects. I certainly defer to the experts here (Pachter, Trapnell, Roberts, Dewey, Li, etc.) on the best route to take. I'm not particularly pleased with our current simple method for doing it. eXpress, like cufflinks, provides FPKM calculations that should be accurate, and you might explore it. We're still evaluating it.
2. logCPM
the average log2-counts-per-million
后一篇:[zz]UniGene