Package org.snpeff.snpEffect.factory
Class SnpEffPredictorFactory
java.lang.Object
org.snpeff.snpEffect.factory.SnpEffPredictorFactory
- Direct Known Subclasses:
SnpEffPredictorFactoryFeatures,SnpEffPredictorFactoryGenesFile,SnpEffPredictorFactoryGff,SnpEffPredictorFactoryKnownGene,SnpEffPredictorFactoryRefSeq
This class creates a SnpEffectPredictor from a file (or a set of files) and a configuration
- Author:
- pcingola
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidprotected voidadd(Chromosome chromo) protected ExonAdd an exonprotected voidAdd a Geneprotected voidAdd a generic Markerprotected voidadd(Transcript tr) Add a transcriptprotected voidAdd a marker to the collectionprotected voidaddSequences(String chr, String chrSeq) Add genomic reference sequencesprotected voidAdjust chromosome length using gene information This is used when the sequence is not available (which makes sense on test-cases and debugging only)protected voidAdjust transcripts: recalculate start, end, strand, etc.protected voidPerform some actions before reading sequencesprotected voidOnly coding transcripts have CDS: Make sure that transcripts having CDS are protein coding It might not be always "precise" though: $ grep CDS genes.gtf | cut -f 2 | ~/snpEff/scripts/uniqCount.pl 113 IG_C_gene 64 IG_D_gene 24 IG_J_gene 366 IG_V_gene 21 TR_C_gene 3 TR_D_gene 82 TR_J_gene 296 TR_V_gene 461 non_stop_decay 63322 nonsense_mediated_decay 905 polymorphic_pseudogene 34 processed_transcript 1340112 protein_codingprotected voidCollapse exons having zero size introns between themabstract SnpEffectPredictorcreate()protected voidCreate random sequences for exons Note: This is only used for test cases!protected voidConsolidate transcripts: If two exons are one right next to the other, join them E.g.protected voidCreate exons from CDS infoprotected voidCreate exons from CDS info WARNING: We might end up with redundant exons if some exons existed before this processprotected Geneprotected Geneprotected MarkerfindMarker(String id) protected TranscriptfindTranscript(String id) protected TranscriptfindTranscript(String trId, String id) protected ChromosomegetOrCreateChromosome(String chromoName) Get a chromosome.protected intparsePosition(String posStr) Parse a string as a 'position'.protected voidRead exon sequences from a FASTA fileprotected voidreplaceTranscript(Transcript trOld, Transcript trNew) voidsetCircularCorrectLargeGap(boolean circularCorrectLargeGap) voidsetCreateRandSequences(boolean createRandSequences) voidsetDebug(boolean debug) voidsetFastaFile(String fastaFile) voidsetFileName(String fileName) voidvoidsetReadSequences(boolean readSequences) Read sequences? Note: This is only used for debugging and testingvoidsetStoreSequences(boolean storeSequences) voidsetVerbose(boolean verbose) protected StringShw differences in chromosome names
-
Field Details
-
MARK
public static final int MARK- See Also:
-
MIN_TOTAL_FRAME_COUNT
public static int MIN_TOTAL_FRAME_COUNT
-
-
Constructor Details
-
SnpEffPredictorFactory
-
-
Method Details
-
add
-
add
-
add
Add an exon- Parameters:
exon-- Returns:
- exon added. Note: If the exon exists with the same ID, return old exon. If exon exists with same ID and same coordiates, add a new exon with different ID.
-
add
Add a Gene -
add
Add a generic Marker -
add
Add a transcript -
addMarker
Add a marker to the collection -
addSequences
Add genomic reference sequences -
adjustChromosomes
protected void adjustChromosomes()Adjust chromosome length using gene information This is used when the sequence is not available (which makes sense on test-cases and debugging only) -
adjustTranscripts
protected void adjustTranscripts()Adjust transcripts: recalculate start, end, strand, etc. -
beforeExonSequences
protected void beforeExonSequences()Perform some actions before reading sequences -
codingFromCds
protected void codingFromCds()Only coding transcripts have CDS: Make sure that transcripts having CDS are protein coding It might not be always "precise" though: $ grep CDS genes.gtf | cut -f 2 | ~/snpEff/scripts/uniqCount.pl 113 IG_C_gene 64 IG_D_gene 24 IG_J_gene 366 IG_V_gene 21 TR_C_gene 3 TR_D_gene 82 TR_J_gene 296 TR_V_gene 461 non_stop_decay 63322 nonsense_mediated_decay 905 polymorphic_pseudogene 34 processed_transcript 1340112 protein_coding -
collapseZeroLenIntrons
protected void collapseZeroLenIntrons()Collapse exons having zero size introns between them -
create
-
createRandSequences
protected void createRandSequences()Create random sequences for exons Note: This is only used for test cases! -
deleteRedundant
protected void deleteRedundant()Consolidate transcripts: If two exons are one right next to the other, join them E.g. exon1:1234-2345, exon2:2346-2400 => exon:1234-2400 This happens mostly in GTF files, where the stop-codon is specified separated from the exon info. -
exonsFromCds
protected void exonsFromCds()Create exons from CDS info -
exonsFromCds
Create exons from CDS info WARNING: We might end up with redundant exons if some exons existed before this process- Parameters:
tr- : Transcript with CDS info, but no exons
-
findGene
-
findGene
-
findMarker
-
findTranscript
-
findTranscript
-
getOrCreateChromosome
Get a chromosome. If it doesn't exist, create it -
getProteinByTrId
-
parsePosition
Parse a string as a 'position'. Note: It subtracts 'inOffset' so that all coordinates are zero-based -
readExonSequences
protected void readExonSequences()Read exon sequences from a FASTA file -
replaceTranscript
-
setCircularCorrectLargeGap
public void setCircularCorrectLargeGap(boolean circularCorrectLargeGap) -
setCreateRandSequences
public void setCreateRandSequences(boolean createRandSequences) -
setDebug
public void setDebug(boolean debug) -
setFastaFile
-
setFileName
-
setRandom
-
setReadSequences
public void setReadSequences(boolean readSequences) Read sequences? Note: This is only used for debugging and testing -
setStoreSequences
public void setStoreSequences(boolean storeSequences) -
setVerbose
public void setVerbose(boolean verbose) -
showChromoNamesDifferences
Shw differences in chromosome names
-