基于SALSA2的Hi-C数据处理


分类: 生物信息学 |
数据分析管道:SALSA2(https://github.com/machinegun/SALSA)
之前的以前分析管道还有:3D-DNA以及LACHESIS
特点:可以矫正组装错误,可以输入GAF格式
输入数据为hic数据与组装结果比对的bam格式,比对的protocal可以参见:
http://github.com/ArimaGenomics/mapping_pipeline
推荐关于Hi-C的文献阅读:
Lieberman-Aiden E, Van Berkum N L, Williams L, et al.
Comprehensive mapping of long-range interactions reveals folding
principles of the human genome[J]. science, 2009, 326(5950):
289-293.
Durand N C, Shamim M S, Machol I, et al. Juicer provides
a one-click system for analyzing loop-resolution Hi-C
experiments[J]. Cell systems, 2016, 3(1): 95-98.
Burton J N, Adey A, Patwardhan R P, et al.
Chromosome-scale scaffolding of de novo genome assemblies based on
chromatin interactions[J]. Nature biotechnology, 2013, 31(12):
1119.
Dudchenko O, Batra S S, Omer A D, et al. De novo assembly
of the Aedes aegypti genome using Hi-C yields chromosome-length
scaffolds[J]. Science, 2017, 356(6333): 92-95.
Mascher M, Gundlach H, Himmelbach A, et al. A chromosome
conformation capture ordered sequence of the barley genome[J].
Nature, 2017, 544(7651): 427.
Ghurye J, Rhie A, Walenz B P, et al. Integrating Hi-C
links with assembly graphs for chromosome-scale assembly[J].
bioRxiv, 2018: 261149.
Meluzzi D, Arya G. Quantification of DNA cleavage
specificity in Hi-C experiments[J]. Nucleic acids research, 2015,
44(1): e4-e4.
+++++++++++++++++++++++++++
关于Hi-C的一些必备知识收集:
1:(http://hicexplorer.readthedocs.io/en/latest/content/example_usage.html)
Usually, only 25%-40% of the reads are valid and used to
build the Hi-C matrix mostly because of the reads that are on
repetitive regions that need to be
discarded.也就是测序不需要测太多
2:Good Hi-C libraries have lower than 10% inter
chromosomal contacts.好的Hi-C 文库染色体间的交互比例低于10%
3:
restriction enzyme (MboI, DpnII), GAATTC
(EcoRI)一般情况需要知道实验过程中用到的是哪一种限制性内切酶
4:
HiCExplorer findRestSite --fasta mm10.fa --searchPattern AAGCTT -o
rest_site_positions.bed(http://hicexplorer.readthedocs.io/en/latest/content/tools/findRestSite.html#findrestsite)
5:https://doc.genomegitar.org/preprocessing_data.html
Hictools
https://doc.genomegitar.org/index.html数据处理工具
6:hic-plotter
https://github.com/kcakdemir/HiCPlotter
7:SALSA
https://github.com/machinegun/SALSA组装工具
8:hic数据过滤http://bxlab-hifive.readthedocs.io/en/latest/filtering_data.html#hic-filtering
9:3d-DNA
https://github.com/theaidenlab/3d-dna组装工具
后一篇:python与mysql学习