miRNA靶基因预测方法(动物)_Nathan0703

http://blog.sina.com.cn/u/1706691033

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

miRNA靶基因预测方法(动物)

(2013-08-04 21:35:39)

miRNA进行靶基因预测，软件根据种子区(seed region)原则把miRNA 5’端的第2-8位碱基与mRNA的3’UTR上的一段7nt序列进行完成互补配对。

表常用软件特点总结

编号	工具	网址	运算速度	特点
1	Targetscan	www.targetscan.org	快	版本一直在更新，用户非常多
2	miranda	http://www.microrna.org/microrna/home.do	快
3	PITA	http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html	慢	影响因子最高，推荐客户选择
4	RNAhybrid	http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/	慢	比TargetScan的假阳性更高
5	microTar	tiger.dbs.nus.edu.sg/microtar/‎	慢
6	miRecords	*mirecords*.biolead.org		已经验证的人类靶标数据库
7	PICTA	http://pictar.mdc-berlin.de/		数据库版本为miRBase 18.0

如何快速找到不同软件预测的交集？简单的方法是，可以使用在线工具。例如 bioinformatics.psb.ugent.be/webtools/Venn/

在网页上分别输入不同的软件的结果列表，提交之后就可以得到结果了。

http://s1/mw690/65ba09d9gx6BZ5CW3m010&690

以上软件一般都能在线使用，例如PITA，可以分别输入多条基因序列和多条miRNA，并且自己设置参数进行靶基因预测。（网址：http://genie.weizmann.ac.il/pubs/mir07/mir07_prediction.html ）
http://s8/mw690/65ba09d9gx6BB4so9Ej27&690

附：PITA和RNAhybrid的在linux系统下的使用方法。（是转载的）

======PITA=====

PITA的基本参数[4]大概是这样子的：

ΔΔG 小于或等于 -10 kcal/mol
Seed区域长度为 7-8 nt
不允许 G:U配对

PITA的帮助文档如下，其中值得你关注的点我注释了出来：

syntax: pita_prediction.pl [OPTIONS]

Execute the PITA algorithm for identifying and scoring microRNA target sites.

options:

-utr : fasta file containing the UTRs to be scanned（以fasta格式的mRNA文件，可以是完整的mRNA，虽然我通常做法是挑取测序后reads的peak来作为被预测输入）

-mir : fasta file containing the microRNA sequences（相应地你的候选miRNA数据库，比如人类miRNA数据库http://www.mirbase.org/cgi-bin/mirna_summary.pl?org=hsa）

-upstream : fasta file containing the upstream sequence for each UTR. The IDs

in should match the IDs found int the UTR file. If less 200 bp are

given (or if no file is given), it is padded with Poly-A.

-flank_up

-flank_down : Flank requirement in basepairs (default: zero for both)

-ddG_context : Number of bases upstream and downstream for target site that are

taken into account when folding the UTR (default: 70)

-prefix : Add the string as a prefix to the output files (pita_results.tab and ext_utr.stab)（也就是输出文件的前缀，作为你识别文件的文件名）

-gxp: Produce a gxp (Genomica project file) output file.

Seed matching parameters: (接下来就是你比较需要重视的参数)

-l <num1-num2>: Search for seed lengths of num1,...,num2 to the MicroRNA (default: 6-8)

（就是seed区域的长度，默认是6-8，这里调整为7-8）

-gu : Lengths for which G:U wobbles are allowed and number of allowed wobbles.

Format of nums: <length;num G:U>,<length;num G:U>,... (default: 6;0,7;1,8;1)

（因为不允许G:U配对，所以调整为 6;0,7;0,8;0）

-m : Lengths for which mismatches are allowed and number of allowed mismatches

Format of nums: <length;num mismatches>,<length;num mismatches>,...

(default: 6;0,7;0,8;1)

-loop : Lengths for which a single loop in either the target or the microrna is allowed

Format of nums: ,<length>,... (default: none)

PITA的标准输出结果示例如下：

UTR microRNA Start End Seed Loop dGduplex dG5 dG3 dG0 dG1 dGopen ddG

chr1-32146379 hsa-miR-339-5p 82 74 8:1:1 0 -20.89 -10.5 -10.39 -39.82 -20.46 -19.35 -1.53

chr20-48173714 hsa-let-7a 39 31 8:1:1 0 -13.4 -5.7 -7.7 -19.48 -0.43 -19.04 5.64

从上述输出结果可以一窥，因为PITA没有提供一个cutoff来限制能量值，所以自行写个脚本去读取PITA的输出文件并筛出ΔΔG 小于或等于 -10 kcal/mol的案例。
另外PITA也不给出miRNA-mRNA之间的配对关系，只给出位置信息，喜欢偷懒的我选择用RNAhybrid帮我去绘制配对关系图。所以RNAhybrid也就沦为我的一个绘图工具而已。

=====RNAhybrid=====

RNAhybrid对MFE（minimum free energy）有cutoff参数限制，所以这里我会选择ΔG小于或等于-20 kcal/mol [4]。

Usage: RNAhybrid [options] [target sequence] [query sequence].

options:

-b <number of hits per target>

-c compact output

-d ,<theta>

-f helix constraint

-h help

-m <max targetlength>

-n <max query length>

-u <max internal loop size (per side)>

-v <max bulge loop size>

-e <energy cut-off>

-p <p-value cut-off>

-s (3utr_fly|3utr_worm|3utr_human)

-g (ps|png|jpg|all)

-t <target file>

-q <query file>

Either a target file has to be given (FASTA format) or one target sequence directly.

Either a query file has to be given (FASTA format) or one query sequence directly.

The helix constraint format is "from,to", eg. -f 2,7 forces

structures to have a helix from position 2 to 7 with respect to the query.

and are the position and shape parameters, respectively,

of the extreme value distribution assumed for p-value calculation.

If omitted, they are estimated from the maximal duplex energy of the query.

In that case, a data set name has to be given with the -s flag.

PS graphical output not supported.

PNG and JPG graphical output not supported.

输入的miRNA和mRNA可以是单纯序列，也可以是一个fasta文件里好多个序列。
输出会直接打印在终端里，所以建议你在终端以 “>" 输出保存为一个文件，所以你也能体会我为什么把它当作我下游一个绘图工具使唤了

RNAhybrid标准输出是这样子的：

target: *****（具体UTR个案为具体个案名字）

length: 30

miRNA : *****（具体miRNA个案为具体个案名字）

length: 22

mfe: -24.4 kcal/mol （MFE 即minimum free energy）

p-value: 0.001448

position 6

target 5' C G GG AU U 3'

GAU GA UAGG UGGUGCUG

UUG CU GUCU ACCACGAU

miRNA 3' A G AAA 5'

所以基本上呢，上述形式是不太适合发表格式的，所以建议你自制一个代码专门读取这些文件，最后这个文件会被整理成这个样子：

-24.4 kcal/mol

*** 5' CCUACCACUCACCCUAGCA 3'

| || |||| ||||||

******** 3' AGCGGGAGAGUUGGGUCGAAAA 5'

-- end && reference

【转】miRNA数据库 http://joseph.yy.blog.163.com/blog/static/50973959201192121757343/
PITA http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html
RNAhybrid http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/>
?Marín, R. M., & Vanícek, J. (2011). Efficient use of accessibility in microRNA target prediction. Nucleic acids research, 39(1), 19-29. doi:10.1093/nar/gkq768

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：DGE测序与转录组测序的差别

后一篇：做细菌的转录组测序，多少数据量足够？

新浪BLOG意见反馈留言板　欢迎批评指正