Package picard.illumina
Class ExtractIlluminaBarcodes
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.illumina.ExtractIlluminaBarcodes
-
@DocumentedFeature public class ExtractIlluminaBarcodes extends CommandLineProgram
Determine the barcode for each read in an Illumina lane. For each tile, a file is written to the basecalls directory of the form s__ _barcode.txt. An output file contains a line for each read in the tile, aligned with the regular basecall output The output file contains the following tab-separated columns: - read subsequence at barcode position - Y or N indicating if there was a barcode match - matched barcode sequence (empty if read did not match one of the barcodes). If there is no match but we're close to the threshold of calling it a match we output the barcode that would have been matched but in lower case
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classExtractIlluminaBarcodes.BarcodeMetricMetrics produced by the ExtractIlluminaBarcodes program that is used to parse data in the basecalls directory and determine to which barcode each read should be assigned.static classExtractIlluminaBarcodes.PerTileBarcodeExtractorExtracts barcodes and accumulates metrics for an entire tile.
-
Field Summary
Fields Modifier and Type Field Description List<String>BARCODEFileBARCODE_FILEstatic StringBARCODE_NAME_COLUMNColumn header for the barcode name.static StringBARCODE_SEQUENCE_1_COLUMNColumn header for the first barcode sequence.static StringBARCODE_SEQUENCE_COLUMNColumn header for the first barcode sequence (preferred).FileBASECALLS_DIRbooleanCOMPRESS_OUTPUTSIntegerLANEstatic StringLIBRARY_NAME_COLUMNColumn header for the library name.intMAX_MISMATCHESintMAX_NO_CALLSFileMETRICS_FILEintMIN_MISMATCH_DELTAintMINIMUM_BASE_QUALITYintMINIMUM_QUALITYintNUM_PROCESSORSFileOUTPUT_DIRStringREAD_STRUCTURE-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description ExtractIlluminaBarcodes()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String[]customCommandLineValidation()Validate that POSITION >= 1, and that all BARCODEs are the same length and uniqueprotected intdoWork()Do the work after command line has been parsed.static voidfinalizeMetrics(Map<String,ExtractIlluminaBarcodes.BarcodeMetric> barcodeToMetrics, ExtractIlluminaBarcodes.BarcodeMetric noMatchMetric)static voidmain(String[] argv)-
Methods inherited from class picard.cmdline.CommandLineProgram
getCommandLine, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
BARCODE_SEQUENCE_COLUMN
public static final String BARCODE_SEQUENCE_COLUMN
Column header for the first barcode sequence (preferred).- See Also:
- Constant Field Values
-
BARCODE_SEQUENCE_1_COLUMN
public static final String BARCODE_SEQUENCE_1_COLUMN
Column header for the first barcode sequence.- See Also:
- Constant Field Values
-
BARCODE_NAME_COLUMN
public static final String BARCODE_NAME_COLUMN
Column header for the barcode name.- See Also:
- Constant Field Values
-
LIBRARY_NAME_COLUMN
public static final String LIBRARY_NAME_COLUMN
Column header for the library name.- See Also:
- Constant Field Values
-
BASECALLS_DIR
@Argument(doc="The Illumina basecalls directory. ", shortName="B") public File BASECALLS_DIR
-
OUTPUT_DIR
@Argument(doc="Where to write _barcode.txt files. By default, these are written to BASECALLS_DIR.", optional=true) public File OUTPUT_DIR
-
LANE
@Argument(doc="Lane number. ", shortName="L") public Integer LANE
-
READ_STRUCTURE
@Argument(doc="A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Sample Barcode, M for molecular barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of \"28T8M8B8S28T\" then the sequence may be split up into four reads:\n* read one with 28 cycles (bases) of template\n* read two with 8 cycles (bases) of molecular barcode (ex. unique molecular barcode)\n* read three with 8 cycles (bases) of sample barcode\n* 8 cycles (bases) skipped.\n* read four with 28 cycles (bases) of template\nThe skipped cycles would NOT be included in an output SAM/BAM file or in read groups therein.", shortName="RS") public String READ_STRUCTURE
-
BARCODE
@Argument(doc="Barcode sequence. These must be unique, and all the same length. This cannot be used with reads that have more than one barcode; use BARCODE_FILE in that case. ", mutex="BARCODE_FILE") public List<String> BARCODE
-
BARCODE_FILE
@Argument(doc="Tab-delimited file of barcode sequences, barcode name and, optionally, library name. Barcodes must be unique and all the same length. Column headers must be \'barcode_sequence\' (or \'barcode_sequence_1\'), \'barcode_sequence_2\' (optional), \'barcode_name\', and \'library_name\'.", mutex="BARCODE") public File BARCODE_FILE
-
METRICS_FILE
@Argument(doc="Per-barcode and per-lane metrics written to this file.", shortName="M") public File METRICS_FILE
-
MAX_MISMATCHES
@Argument(doc="Maximum mismatches for a barcode to be considered a match.") public int MAX_MISMATCHES
-
MIN_MISMATCH_DELTA
@Argument(doc="Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match.") public int MIN_MISMATCH_DELTA
-
MAX_NO_CALLS
@Argument(doc="Maximum allowable number of no-calls in a barcode read before it is considered unmatchable.") public int MAX_NO_CALLS
-
MINIMUM_BASE_QUALITY
@Argument(shortName="Q", doc="Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even in the bases match.") public int MINIMUM_BASE_QUALITY
-
MINIMUM_QUALITY
@Argument(doc="The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina\'s spec describes as the minimum, but in practice the value has been observed lower.") public int MINIMUM_QUALITY
-
COMPRESS_OUTPUTS
@Argument(shortName="GZIP", doc="Compress output s_l_t_barcode.txt files using gzip and append a .gz extension to the file names.") public boolean COMPRESS_OUTPUTS
-
NUM_PROCESSORS
@Argument(doc="Run this many PerTileBarcodeExtractors in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0 then the number of cores used will be the number available on the machine less NUM_PROCESSORS.") public int NUM_PROCESSORS
-
-
Method Detail
-
doWork
protected int doWork()
Description copied from class:CommandLineProgramDo the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWorkin classCommandLineProgram- Returns:
- program exit status.
-
finalizeMetrics
public static void finalizeMetrics(Map<String,ExtractIlluminaBarcodes.BarcodeMetric> barcodeToMetrics, ExtractIlluminaBarcodes.BarcodeMetric noMatchMetric)
-
customCommandLineValidation
protected String[] customCommandLineValidation()
Validate that POSITION >= 1, and that all BARCODEs are the same length and unique- Overrides:
customCommandLineValidationin classCommandLineProgram- Returns:
- null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
-
main
public static void main(String[] argv)
-
-