Package org.snpeff.vcf
Class VcfEntry
java.lang.Object
org.snpeff.interval.Interval
org.snpeff.interval.Marker
org.snpeff.vcf.VcfEntry
- All Implemented Interfaces:
Serializable,Cloneable,Comparable<Interval>,Iterable<VcfGenotype>,TxtSerializable
A VCF entry (a line) in a VCF file
- Author:
- pablocingolani
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final doublestatic final doubleprotected String[]protected Stringprotected Stringstatic final String[]protected Stringstatic final Stringprotected Stringprotected String[]protected String[]protected Stringprotected byte[]static final Patternprotected Stringprotected Stringprotected intprotected Doubleprotected Stringstatic final Stringprotected LinkedList<Variant>static final Stringstatic final String[]static final Stringstatic final String[]static final Stringstatic final String[]static final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringprotected VcfFileIteratorprotected ArrayList<VcfGenotype>static final charFields inherited from class org.snpeff.interval.Interval
chromosomeNameOri, end, id, parent, start, strandMinus -
Constructor Summary
ConstructorsConstructorDescriptionVcfEntry(VcfFileIterator vcfFileIterator, String line, int lineNum, boolean parseNow) Create a line form a file iteratorVcfEntry(VcfFileIterator vcfFileIterator, Marker parent, String chromosomeName, int start, String id, String ref, String altsStr, double quality, String filterPass, String infoStr, String format) -
Method Summary
Modifier and TypeMethodDescriptionvoidAdd string to FILTER fieldvoidAdd a 'FORMAT' fieldvoidaddGenotype(String vcfGenotypeStr) Add a genotype as a stringvoidAdd a "key=value" tuple the info fieldCategorization by allele frequencyIs this entry heterozygous? Infer Hom/Her if there is only one sample in the file.check()Perform several simple checks and report problems (if any).static StringReturn a string without leading, trailing and duplicated underscoresPerform a shallow clonebooleanCompress genotypes into "HO/HE/NA" INFO fieldsbooleanRemove a string from FILTER fieldintgetAltIndex(String alt) Get index of matching ALT entryString[]getAlts()Create a comma separated ALTS stringOriginal chromosome name (as it appeared in the VCF file)String[]byte[]Return genotypes parsed as an array of codesGet info stringGet info string for a specific alleleGet an INFO field matching a variantbooleangetInfoFlag(String key) Does the entry exists?doublegetInfoFloat(String key) Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitivelonggetInfoInt(String key) Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitiveGet all keys available in the info fieldGet the full (unparsed) INFO fieldgetLine()Original VCF line (from file)intintnumber of samples in this VCF filedoublegetRef()getStr()getVcfEffects(EffFormatVersion formatVersion) Parse 'EFF' info field and get a list of effectsgetVcfGenotype(int index) getVcfInfo(String id) Get VcfInfo type for a given IDGet Info number for a given IDbooleanbooleanbooleanbooleanbooleanIs this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.booleanDo we have compressed genotypes in "HO,HE,NA" INFO fields?static booleanDoes 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)booleanbooleanIs this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.protected booleanShow an error if parent does not include child?booleanIs thins a VCF entry with a single SNP?booleanIs this variant a singleton (appears only in one genotype)static booleanisValidInfoKey(String key) Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)static booleanisValidInfoValue(String value) Check that this value can be added to an INFO fieldbooleanIs this a change or are the ALTs actually the same as the referencebooleanIs this ALT string a variant?iterator()intmac()Calculate Minor allele countdoublemaf()Calculate Minor allele frequencyvoidparse()Parse a 'line' from a 'vcfFileIterator'parseLof()Parse LOF from VcfEntryparseNmd()Parse NMD from VcfEntryvoidremoveInfo(String key) Remove INFO fieldbooleanParse INFO fieldsvoidvoidvoidsetGenotypeStr(String genotypeFieldsStr) voidsetLineNum(int lineNum) toStr()To string as a simple "CHR:START_REF/ALTs" formattoString()Show only first eight fields (no genotype entries)Uncompress VCF entry having genotypes in "HO,HE,NA" fieldsvariants()Create a list of variants from this VcfEntrystatic StringvcfInfoDecode(String str) Decode INFO valuestatic StringvcfInfoEncode(String str) Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TABstatic StringvcfInfoKeySafe(String str) Return a string safe to be used in an 'INFO' field keystatic StringvcfInfoValueSafe(String str) Return a string safe to be used in an 'INFO' field valueMethods inherited from class org.snpeff.interval.Marker
adjust, apply, applyDel, applyDup, applyIns, applyMixed, clone, codonTable, compareTo, compareToPos, distance, distanceBases, getParent, getType, idChain, idChain, idChain, includes, intersect, isAdjustIfParentDoesNotInclude, isDeferredAnalysis, minus, query, query, readTxt, serializeParse, serializeSave, shouldApply, union, variantEffect, variantEffectNonRefMethods inherited from class org.snpeff.interval.Interval
equals, findParent, getChromosome, getChromosomeName, getChromosomeNum, getEnd, getGenome, getGenomeName, getId, getStart, getStrand, hashCode, intersects, intersects, intersects, intersects, intersectSize, isCircular, isSameChromo, isStrandMinus, isStrandPlus, isValid, setChromosomeNameOri, setEnd, setId, setParent, setStart, setStrandMinus, shiftCoordinates, size, toStringAsciiArt, toStrPosMethods inherited from class java.lang.Object
equals, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
FILTER_PASS
- See Also:
-
WITHIN_FIELD_SEP
public static final char WITHIN_FIELD_SEP- See Also:
-
SUB_FIELD_SEP
- See Also:
-
EMPTY_STRING_ARRAY
-
ALLELE_FEQUENCY_COMMON
public static final double ALLELE_FEQUENCY_COMMON- See Also:
-
ALLELE_FEQUENCY_LOW
public static final double ALLELE_FEQUENCY_LOW- See Also:
-
INFO_KEY_PATTERN
-
VCF_INFO_END
- See Also:
-
VCF_ALT_NON_REF
- See Also:
-
VCF_ALT_NON_REF_gVCF
- See Also:
-
VCF_ALT_MISSING_REF
- See Also:
-
VCF_ALT_NON_REF_gVCF_ARRAY
-
VCF_ALT_NON_REF_ARRAY
-
VCF_ALT_MISSING_REF_ARRAY
-
VCF_INFO_HOMS
- See Also:
-
VCF_INFO_HETS
- See Also:
-
VCF_INFO_NAS
- See Also:
-
VCF_INFO_PRIVATE
- See Also:
-
alts
-
altStr
-
chromosomeName
-
filter
-
format
-
formatFields
-
genotypeFields
-
genotypeFieldsStr
-
genotypeScores
protected byte[] genotypeScores -
info
-
infoStr
-
line
-
lineNum
protected int lineNum -
quality
-
ref
-
variants
-
vcfEffects
-
vcfFileIterator
-
vcfGenotypes
-
-
Constructor Details
-
VcfEntry
-
VcfEntry
Create a line form a file iterator
-
-
Method Details
-
cleanUnderscores
Return a string without leading, trailing and duplicated underscores -
isEmpty
Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values) -
isValidInfoKey
Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3) -
isValidInfoValue
Check that this value can be added to an INFO field- Returns:
- true if OK, false if invalid value
-
vcfInfoDecode
Decode INFO value -
vcfInfoEncode
Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TAB -
vcfInfoKeySafe
Return a string safe to be used in an 'INFO' field key -
vcfInfoValueSafe
Return a string safe to be used in an 'INFO' field value -
addFilter
Add string to FILTER field -
addFormat
Add a 'FORMAT' field -
addGenotype
Add a genotype as a string -
addInfo
Add a "key=value" tuple the info field- Parameters:
key- : INFO key namevalue- : Can be null if it is a boolean field.
-
alleleFrequencyType
Categorization by allele frequency -
calcHetero
Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file. Ohtherwise the field is null. -
check
Perform several simple checks and report problems (if any). -
cloneShallow
Description copied from class:MarkerPerform a shallow clone- Overrides:
cloneShallowin classMarker
-
compressGenotypes
public boolean compressGenotypes()Compress genotypes into "HO/HE/NA" INFO fields -
delFilter
Remove a string from FILTER field -
getAltIndex
Get index of matching ALT entry- Returns:
- -1 if not found
-
getAlts
-
getAltsStr
Create a comma separated ALTS string -
getChromosomeNameOri
Original chromosome name (as it appeared in the VCF file)- Overrides:
getChromosomeNameOriin classInterval
-
getFilter
-
getFormat
-
getFormatFields
-
getGenotypesScores
public byte[] getGenotypesScores()Return genotypes parsed as an array of codes -
getInfo
Get info string -
getInfo
Get info string for a specific allele -
getInfo
Get an INFO field matching a variant -
getInfoFlag
Does the entry exists? -
getInfoFloat
Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitive -
getInfoInt
Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitive -
getInfoKeys
Get all keys available in the info field -
getInfoStr
Get the full (unparsed) INFO field -
getLine
Original VCF line (from file) -
getLineNum
public int getLineNum() -
getNumberOfSamples
public int getNumberOfSamples()number of samples in this VCF file -
getQuality
public double getQuality() -
getRef
-
getStr
-
getVcfEffects
-
getVcfEffects
Parse 'EFF' info field and get a list of effects -
getVcfFileIterator
-
getVcfGenotype
-
getVcfGenotypes
-
getVcfInfo
Get VcfInfo type for a given ID -
getVcfInfoNumber
Get Info number for a given ID -
hasField
-
hasGenotypes
public boolean hasGenotypes() -
hasInfo
-
hasQuality
public boolean hasQuality() -
isBiAllelic
public boolean isBiAllelic()Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation. -
isCompressedGenotypes
public boolean isCompressedGenotypes()Do we have compressed genotypes in "HO,HE,NA" INFO fields? -
isFilterPass
public boolean isFilterPass() -
isMultiallelic
public boolean isMultiallelic()Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation. -
isShowWarningIfParentDoesNotInclude
protected boolean isShowWarningIfParentDoesNotInclude()Description copied from class:MarkerShow an error if parent does not include child?- Overrides:
isShowWarningIfParentDoesNotIncludein classMarker
-
isSingleSnp
public boolean isSingleSnp()Is thins a VCF entry with a single SNP? -
isSingleton
public boolean isSingleton()Is this variant a singleton (appears only in one genotype) -
isVariant
public boolean isVariant()Is this a change or are the ALTs actually the same as the reference -
isVariant
Is this ALT string a variant? -
iterator
- Specified by:
iteratorin interfaceIterable<VcfGenotype>
-
mac
public int mac()Calculate Minor allele count -
maf
public double maf()Calculate Minor allele frequency -
parse
public void parse()Parse a 'line' from a 'vcfFileIterator' -
parseLof
Parse LOF from VcfEntry -
parseNmd
Parse NMD from VcfEntry -
removeInfo
Remove INFO field -
rmInfo
Parse INFO fields -
setFilter
-
setFormat
-
setGenotypeStr
-
setLineNum
public void setLineNum(int lineNum) -
toStr
To string as a simple "CHR:START_REF/ALTs" format -
toString
-
toStringNoGt
Show only first eight fields (no genotype entries) -
uncompressGenotypes
Uncompress VCF entry having genotypes in "HO,HE,NA" fields -
variants
Create a list of variants from this VcfEntry
-