加载中…
个人资料
Mars-Zhan
Mars-Zhan
  • 博客等级:
  • 博客积分:0
  • 博客访问:1,785
  • 关注人气:52
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

mRNA序列、cDNA序列、ORF序列、CDS序列、Promoter、STS、ETS、strand

(2015-08-21 12:27:51)
分类: biology
 mRNA(messenger RNA)信使RNA,是由编码区(CDS)、上游的5’非编码区和下游3’非编码区组成,真核生物mRNA的5’端带有7-甲基鸟苷-三磷酸帽子结构,3’端有多腺苷酸尾巴,但NCBI中mRNA序列实际上是cDNA序列,即经过反转录得到的与RNA序列互补的DNA序列,一般不包括3’多腺苷酸尾巴。一个cDNA序列被称为一个转录子,第一个碱基所在的位置为转录起始位点(TSS),cDNA都是由外显子组成,但编码蛋白质的外显子只有一个,即CDS(coding sequence),这段序列也就是一个ORF,也就是这个cDNA的ORF序列。参与特定基因转录及其调控的TSS上游序列称为启动子(Promoter), 如原核生物在转录起始位点上游-10有一段TATAAT的保守序列,有助于局部解链,在-35有一段TTGACA序列提供RNA聚合酶识别信号,真核生物 上游-25到-30TATA决定起始位点,-75位置CAAT与RNA聚合酶,这些都是启动子,启动子的范围非常大,可以包含转录起始位点上游 2000bp,有些特定基因的转录区内部也存在着转录因子的结合位点,因此也属于启动子范围。

    克隆可以简单理解为复制品,例如假设通过提取mRNA,反转录后得到cDNA序列,然后将这段序列转入载体,再通过划线不断的繁殖,就会得到许多装有这段cDNA序列的克隆,实验室为了方便,在给得到的这些克隆起名时,一般会取cDNA序列的名,但实际上在这个克隆里面不仅包括了这个cDNA,还包括了载体的DNA。

    STS(sequence-tagged site)序列标记位点,是基因组上定位明确、作为界标并能通过PCR扩增被唯一操作的短的、单拷贝DNA序列,一般长度为200-500bp,一个 DNA序列要成为STS,首先序列必须已知,能用PCR方法检测,第二STS必须在基因组上具有唯一的定位点。通过STS可以判断在不同条件下测序得到的 DNA序列的准确性。

    EST(expressed sequence tag)表达序列标签,是从一个随机选择的cDNA克隆,进行5’端和3’端单一次测序挑选出来获得的短的cDNA序列。全基因组测序发现基因即昂贵又费 时,因为基因组中只有2%序列编码蛋白质,因此可以对真正编码蛋白质的mRNA构建cDNA文库,对cDNA进行测序,得到EST序列,从而发现新基因。

可以通过Ensembl查找基因或转录本序列:

http://www.ensembl.org/info/website/tutorials/sequence.html

但是,若采用手动提取转录本序列,则要注意strand的正负链:+/1代表forward strand;-/-1代表的是reverse strand。具体的意思:参考:https://www.biostars.org/p/3423/        也就是如下:

  • DNA is double-stranded. By convention, for a reference chromosome, one whole strand is designated the "forward strand" and the other the "reverse strand". This designation is arbitrary. Sometimes the terms "plus strand" and "minus strand" are used instead.

  • Visually (I'm not talking about the transcription machinery yet), you would typically read the sequence of a strand in the 5-3 direction. For the forward strand, this means reading left-to-right, and for the reverse strand it means right-to-left.

  • A gene can live on a DNA strand in one of two orientations. The gene is said to have a coding strand(also known as its sense strand), and a template strand (also known as its antisense strand). For 50% of genes, its coding strand will correspond to the chromosome's forward strand, and for the other 50% it will correspond to the reverse strand.

  • The mRNA (and protein) sequence of a gene corresponds to the DNA sequence as read (again, visually) from the gene's coding strand. So the mRNA sequence always corresponds to the 5-3 coding sequence of a gene.

  • Now, the RNA polymerase machinery moves along the DNA in the 5-3 orientation of the coding strand (e.g. left-to-right for a forward strand gene). It reads the bases from the template strand (so it is reading in the 3-5 direction from the point-of-view of the template strand), and builds the mRNA as it goes. This means that the mRNA matches the coding sequence of the gene, not the template sequence. (Thisdiagram from Wikipedia illustrates).

  • Annotations such as Ensembl and UCSC are concerned with the coding sequences of genes, so when they say a gene is on the forward strand, it means the gene's coding sequence is on the forward strand. To follow through again, that means that during transcription of this forward-strand gene, the gene's template sequence is read from the reverse strand, producing an mRNA that matches the sequence on the forward strand.

注意:strand Is either 1 for forward strand or -1 for reverse strand也就是说strand1,拼接后的序列通过碱基互补和T->U成为mRNA序列,若strand-1,则需要将序列反向再经过T->U成为mRNA序列

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有