关键词:Iso-seq
参考文献:
Tardaguila M, de la Fuente
L, Marti C, et al. SQANTI: extensive characterization of long-read
transcript sequences for quality
control in full-length
transcriptome identification and quantification[J].
Genome research, 2018.
SQANTI is a
pipeline for the in-depth
characterization of isoforms obtained by full-length transcript
sequencing, which are commonly
returned in a
fasta file format without any extra information about
gene/transcript annotation or attribute
description. SQANTI provides a wide range of descriptors of
transcript quality and generates
a graphical report to aid in the
interpretation of the sequencing results.
Although SQANTI is oriented
to be used for characterization
of isoforms generated by PacBio Iso-Seq pipeline, it can be
used for any
set of transcript isoforms in fasta
file format. Besides, it can be applied to any
organism.
Moreover, SQANTI adds another functionality, SQANTI filtering
function, that allows to filter out artifact transcripts by taking
advantage of SQANTI-provided
attributes and Machine
Learning methods.
SQANTI pipeline
steps:
First, as long-read
sequencing usually has a high rate of errors along sequences, it
performs a reference-based correction of sequences.
Secondly, it generates genes
models and classifies
transcripts based on splice junctions.
Third, it predicts
ORFs for each
transcript, obtaining information about the coding potential of
each sequence.
Finally, it carries out a deep characterization of isoforms at both
transcript and junction
level and generates
a report with several
plots describing in detail
the mayor attributes that catalog your set of sequenced
isoforms.
Together with SQANTI_qc
function, the user can use the SQANTI filter function to remove
isoforms potential to be artifacts. To get this curanted
transcriptome, SQANTI filtering uses machine learning methods
together with SQANTI_qc
attributes to create a classifier of artifacts.