加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

序列处理工具seqtk

(2015-06-15 20:33:33)
分类: 生物信息学
1:处理二代测序的小工具,软件下载网址:https://github.com/lh3/seqtk

2:举例如下:

  • Convert FASTQ to FASTA:

    
    seqtk seq -a in.fq.gz > out.fa
    
    
  • Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to N (the 2nd):

    
    seqtk seq -aQ64 -q20 in.fq > out.fa
    seqtk seq -aQ64 -q20 -n N in.fq > out.fa
    
    
  • Fold long FASTA/Q lines and remove FASTA/Q comments:

    
    seqtk seq -Cl60 in.fa > out.fa
    
    
  • Convert multi-line FASTQ to 4-line FASTQ:

    
    seqtk seq -l0 in.fq > out.fq
    
    
  • Reverse complement FASTA/Q:

    
    seqtk seq -r in.fq > out.fq
    
    
  • Extract sequences with names in file name.lst, one sequence name per line:

    
    seqtk subseq in.fq name.lst > out.fq
    
    
  • Extract sequences in regions contained in file reg.bed:

    
    seqtk subseq in.fa reg.bed > out.fa
    
    
  • Mask regions in reg.bed to lowercases:

    
    seqtk seq -M reg.bed in.fa > out.fa
    
    
  • Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):关键是可以实现随机抽取序列

    
    seqtk sample -s100 read1.fq 10000 > sub1.fq
    seqtk sample -s100 read2.fq 10000 > sub2.fq
    
    
  • Trim low-quality bases from both ends using the Phred algorithm:

    
    seqtk trimfq in.fq > out.fq
    
    
  • Trim 5bp from the left end of each read and 10bp from the right end:

    
    seqtk trimfq -b 5 -e 10 in.fa > out.fa
    

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有