Package htsjdk.variant.vcf
Class AbstractVCFCodec
- java.lang.Object
-
- htsjdk.tribble.AbstractFeatureCodec<T,LineIterator>
-
- htsjdk.tribble.AsciiFeatureCodec<VariantContext>
-
- htsjdk.variant.vcf.AbstractVCFCodec
-
- All Implemented Interfaces:
FeatureCodec<VariantContext,LineIterator>,NameAwareCodec
public abstract class AbstractVCFCodec extends AsciiFeatureCodec<VariantContext> implements NameAwareCodec
-
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,List<Allele>>alleleMapprotected booleandoOnTheFlyModificationsIf true, then we'll magically fix up VCF headers on the fly when we read them inprotected HashMap<String,List<String>>filterHashprotected String[]genotypePartsprotected VCFHeaderheaderprotected intlineNoprotected String[]locPartsstatic intMAX_ALLELE_SIZE_BEFORE_WARNINGprotected Stringnameprotected static intNUM_STANDARD_FIELDSprotected String[]partsprotected StringremappedSampleNameIf non-null, we will replace the sample name read from the VCF header with this sample name.protected Map<String,String>stringCachestatic booleanvalidateprotected VCFHeaderVersionversionprotected booleanwarnedAboutNoEqualsForNonFlag
-
Constructor Summary
Constructors Modifier Constructor Description protectedAbstractVCFCodec()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description static booleancanDecodeFile(String potentialInput, String MAGIC_HEADER_LINE)LazyGenotypesContext.LazyDatacreateGenotypeMap(String str, List<Allele> alleles, String chr, int pos)create a genotype mapVariantContextdecode(String line)decode the line into a feature (VariantContext)FeaturedecodeLoc(String line)the fast decode functionvoiddisableOnTheFlyModifications()Forces all VCFCodecs to not perform any on the fly modifications to the VCF header of VCF records.protected voidgenerateException(String message)protected static voidgenerateException(String message, int lineNo)VCFAltHeaderLinegetAltHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)Create and return a VCFAltHeaderLine object from a header line string that conforms to thesourceVersionprotected StringgetCachedString(String str)Return a cached copy of the supplied string.VCFHeadergetHeader()VCFMetaHeaderLinegetMetaHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)Create and return a VCFMetaHeaderLine object from a header line string that conforms to thesourceVersionStringgetName()get the name of this codecVCFPedigreeHeaderLinegetPedigreeHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)Create and return a VCFPedigreeHeaderLine object from a header line string that conforms to thesourceVersionVCFSampleHeaderLinegetSampleHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)Create and return a VCFSampleHeaderLine object from a header line string that conforms to thesourceVersionTabixFormatgetTabixFormat()Define the tabix format for the feature, used for indexing.VCFHeaderVersiongetVersion()protected static AlleleoneAllele(String index, List<Allele> alleles)create a an allele from an index and an array of allelesprotected static List<Allele>parseAlleles(String ref, String alts, int lineNo)parse out the allelesprotected abstract List<String>parseFilters(String filterString)parse the filter string, first checking to see if we already have parsed it in a previous attemptprotected static List<Allele>parseGenotypeAlleles(String GT, List<Allele> alleles, Map<String,List<Allele>> cache)parse genotype alleles from the genotype stringprotected VCFHeaderparseHeaderFromLines(List<String> headerStrings, VCFHeaderVersion version)create a VCF header from a set of header record linesprotected static DoubleparseQual(String qualString)parse out the qual valuevoidsetName(String name)set the name of this codecvoidsetRemappedSampleName(String remappedSampleName)Replaces the sample name read from the VCF header with the remappedSampleName.VCFHeadersetVCFHeader(VCFHeader newHeader, VCFHeaderVersion newVersion)Explicitly set the VCFHeader on this codec.-
Methods inherited from class htsjdk.tribble.AsciiFeatureCodec
close, decode, isDone, makeIndexableSourceFromStream, makeSourceFromStream, readActualHeader, readHeader
-
Methods inherited from class htsjdk.tribble.AbstractFeatureCodec
decodeLoc, getFeatureType
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface htsjdk.tribble.FeatureCodec
canDecode, getPathToDataFile
-
-
-
-
Field Detail
-
MAX_ALLELE_SIZE_BEFORE_WARNING
public static final int MAX_ALLELE_SIZE_BEFORE_WARNING
-
NUM_STANDARD_FIELDS
protected static final int NUM_STANDARD_FIELDS
- See Also:
- Constant Field Values
-
header
protected VCFHeader header
-
version
protected VCFHeaderVersion version
-
validate
public static boolean validate
-
parts
protected String[] parts
-
genotypeParts
protected String[] genotypeParts
-
locParts
protected final String[] locParts
-
name
protected String name
-
lineNo
protected int lineNo
-
warnedAboutNoEqualsForNonFlag
protected boolean warnedAboutNoEqualsForNonFlag
-
doOnTheFlyModifications
protected boolean doOnTheFlyModifications
If true, then we'll magically fix up VCF headers on the fly when we read them in
-
remappedSampleName
protected String remappedSampleName
If non-null, we will replace the sample name read from the VCF header with this sample name. This feature works only for single-sample VCFs.
-
-
Method Detail
-
parseFilters
protected abstract List<String> parseFilters(String filterString)
parse the filter string, first checking to see if we already have parsed it in a previous attempt- Parameters:
filterString- the string to parse- Returns:
- a set of the filters applied
-
parseHeaderFromLines
protected VCFHeader parseHeaderFromLines(List<String> headerStrings, VCFHeaderVersion version)
create a VCF header from a set of header record lines- Parameters:
headerStrings- a list of strings that represent all the ## and # entries- Returns:
- a VCFHeader object
-
getHeader
public VCFHeader getHeader()
- Returns:
- the header that was either explicitly set on this codec, or read from the file. May be null. The returned value should not be modified.
-
getVersion
public VCFHeaderVersion getVersion()
- Returns:
- the version number that was either explicitly set on this codec, or read from the file. May be null.
-
setVCFHeader
public VCFHeader setVCFHeader(VCFHeader newHeader, VCFHeaderVersion newVersion)
Explicitly set the VCFHeader on this codec. This will overwrite the header read from the file and the version state stored in this instance; conversely, reading the header from a file will overwrite whatever is set here.- Parameters:
newHeader-newVersion-- Returns:
- the actual header for this codec. The returned header may not be identical to the header argument since the header lines may be "repaired" (i.e., rewritten) if doOnTheFlyModifications is set.
- Throws:
TribbleException- if the requested header version is not compatible with the existing version
-
getAltHeaderLine
public VCFAltHeaderLine getAltHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
Create and return a VCFAltHeaderLine object from a header line string that conforms to thesourceVersion- Parameters:
headerLineString- VCF header line being parsed without the leading "##ALT="sourceVersion- the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.- Returns:
- a VCFAltHeaderLine object
-
getPedigreeHeaderLine
public VCFPedigreeHeaderLine getPedigreeHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
Create and return a VCFPedigreeHeaderLine object from a header line string that conforms to thesourceVersion- Parameters:
headerLineString- VCF header line being parsed without the leading "##PEDIGREE="sourceVersion- the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.- Returns:
- a VCFPedigreeHeaderLine object
-
getMetaHeaderLine
public VCFMetaHeaderLine getMetaHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
Create and return a VCFMetaHeaderLine object from a header line string that conforms to thesourceVersion- Parameters:
headerLineString- VCF header line being parsed without the leading "##META="sourceVersion- the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.- Returns:
- a VCFMetaHeaderLine object
-
getSampleHeaderLine
public VCFSampleHeaderLine getSampleHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
Create and return a VCFSampleHeaderLine object from a header line string that conforms to thesourceVersion- Parameters:
headerLineString- VCF header line being parsed without the leading "##SAMPLE="sourceVersion- the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.- Returns:
- a VCFSampleHeaderLine object
-
decodeLoc
public Feature decodeLoc(String line)
the fast decode function- Parameters:
line- the line of text for the record- Returns:
- a feature, (not guaranteed complete) that has the correct start and stop
-
decode
public VariantContext decode(String line)
decode the line into a feature (VariantContext)- Specified by:
decodein classAsciiFeatureCodec<VariantContext>- Parameters:
line- the line- Returns:
- a VariantContext
- See Also:
AsciiFeatureCodec.decode(htsjdk.tribble.readers.LineIterator)
-
getName
public String getName()
get the name of this codec- Specified by:
getNamein interfaceNameAwareCodec- Returns:
- our set name
-
setName
public void setName(String name)
set the name of this codec- Specified by:
setNamein interfaceNameAwareCodec- Parameters:
name- new name
-
getCachedString
protected String getCachedString(String str)
Return a cached copy of the supplied string.- Parameters:
str- string- Returns:
- interned string
-
oneAllele
protected static Allele oneAllele(String index, List<Allele> alleles)
create a an allele from an index and an array of alleles- Parameters:
index- the indexalleles- the alleles- Returns:
- an Allele
-
parseGenotypeAlleles
protected static List<Allele> parseGenotypeAlleles(String GT, List<Allele> alleles, Map<String,List<Allele>> cache)
parse genotype alleles from the genotype string- Parameters:
GT- GT stringalleles- list of possible allelescache- cache of alleles for GT- Returns:
- the allele list for the GT string
-
parseQual
protected static Double parseQual(String qualString)
parse out the qual value- Parameters:
qualString- the quality string- Returns:
- return a double
-
parseAlleles
protected static List<Allele> parseAlleles(String ref, String alts, int lineNo)
parse out the alleles- Parameters:
ref- the reference basealts- a string of alternates to break into alleleslineNo- the line number for this record- Returns:
- a list of alleles, and a pair of the shortest and longest sequence
-
createGenotypeMap
public LazyGenotypesContext.LazyData createGenotypeMap(String str, List<Allele> alleles, String chr, int pos)
create a genotype map- Parameters:
str- the stringalleles- the list of alleles- Returns:
- a mapping of sample name to genotype object
-
disableOnTheFlyModifications
public final void disableOnTheFlyModifications()
Forces all VCFCodecs to not perform any on the fly modifications to the VCF header of VCF records. Useful primarily for raw comparisons such as when comparing raw VCF records
-
setRemappedSampleName
public void setRemappedSampleName(String remappedSampleName)
Replaces the sample name read from the VCF header with the remappedSampleName. Works only for single-sample VCFs -- attempting to perform sample name remapping for multi-sample VCFs will produce an Exception.- Parameters:
remappedSampleName- replacement sample name for the sample specified in the VCF header
-
generateException
protected void generateException(String message)
-
generateException
protected static void generateException(String message, int lineNo)
-
getTabixFormat
public TabixFormat getTabixFormat()
Description copied from interface:FeatureCodecDefine the tabix format for the feature, used for indexing. Default implementation throws an exception. Note that onlyAsciiFeatureCodeccould read tabix files as defined inAbstractFeatureReader.getFeatureReader(String, String, FeatureCodec, boolean, java.util.function.Function, java.util.function.Function)- Specified by:
getTabixFormatin interfaceFeatureCodec<VariantContext,LineIterator>- Returns:
- the format to use with tabix
-
-