序列处理工具seqtk_fanyucai

个人资料

微博

正文字体大小：大中小

序列处理工具seqtk

(2015-06-15 20:33:33)

分类：生物信息学

1:处理二代测序的小工具，软件下载网址：https://github.com/lh3/seqtk

2:举例如下：

Convert FASTQ to FASTA:
```
seqtk seq -a in.fq.gz > out.fa
```
Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to N (the 2nd):
```
seqtk seq -aQ64 -q20 in.fq > out.fa
seqtk seq -aQ64 -q20 -n N in.fq > out.fa
```
Fold long FASTA/Q lines and remove FASTA/Q comments:
```
seqtk seq -Cl60 in.fa > out.fa
```
Convert multi-line FASTQ to 4-line FASTQ:
```
seqtk seq -l0 in.fq > out.fq
```
Reverse complement FASTA/Q:
```
seqtk seq -r in.fq > out.fq
```
Extract sequences with names in file name.lst, one sequence name per line:
```
seqtk subseq in.fq name.lst > out.fq
```
Extract sequences in regions contained in file reg.bed:
```
seqtk subseq in.fa reg.bed > out.fa
```
Mask regions in reg.bed to lowercases:
```
seqtk seq -M reg.bed in.fa > out.fa
```
Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):关键是可以实现随机抽取序列
```
seqtk sample -s100 read1.fq 10000 > sub1.fq
seqtk sample -s100 read2.fq 10000 > sub2.fq
```
Trim low-quality bases from both ends using the Phred algorithm:
```
seqtk trimfq in.fq > out.fq
```
Trim 5bp from the left end of each read and 10bp from the right end:
```
seqtk trimfq -b 5 -e 10 in.fa > out.fa
```

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report