Package picard.sam.util
Class ReadNameParser
- java.lang.Object
-
- picard.sam.util.ReadNameParser
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
OpticalDuplicateFinder
public class ReadNameParser extends Object implements Serializable
Provides access to the physical location information about a cluster. All values should be defaulted to -1 if unavailable. ReadGroup and Tile should only allow non-zero positive integers, x and y coordinates may be negative.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_READ_NAME_REGEX
The read name regular expression (regex) is used to extract three pieces of information from the read name: tile, x location, and y location.protected String
readNameRegex
-
Constructor Summary
Constructors Constructor Description ReadNameParser()
Creates are read name parser using the default read name regex and optical duplicate distance.ReadNameParser(String readNameRegex)
Creates are read name parser using the given read name regex.ReadNameParser(String readNameRegex, htsjdk.samtools.util.Log log)
Creates are read name parser using the given read name regex.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
addLocationInformation(String readName, PhysicalLocation loc)
Method used to extract tile/x/y from the read name and add it to the PhysicalLocationShort so that it can be used later to determine optical duplicationstatic int
getLastThreeFields(String readName, char delim, int[] tokens)
Given a string, splits the string by the delimiter, and returns the the last three fields parsed as integers.static int
rapidParseInt(String input)
Very specialized method to rapidly parse a sequence of digits from a String up until the first non-digit character.
-
-
-
Field Detail
-
DEFAULT_READ_NAME_REGEX
public static final String DEFAULT_READ_NAME_REGEX
The read name regular expression (regex) is used to extract three pieces of information from the read name: tile, x location, and y location. Any read name regex should parse the read name to produce these and only these values. An example regex is: (?:.*:)?([0-9]+)[^:]*:([0-9]+)[^:]*:([0-9]+)[^:]*$ which assumes that fields in the read name are delimited by ':' and the last three fields correspond to the tile, x and y locations, ignoring any trailing non-digit characters. The default regex is optimized for fast parsing (seegetLastThreeFields(String, char, int[])
) by searching for the last three fields, ignoring any trailing non-digit characters, assuming the delimiter ':'. This should consider correctly read names where we have 5 or 7 field with the last three fields being tile/x/y, as is the case for the majority of read names produced by Illumina technology.
-
readNameRegex
protected final String readNameRegex
-
-
Constructor Detail
-
ReadNameParser
public ReadNameParser()
Creates are read name parser using the default read name regex and optical duplicate distance. SeeDEFAULT_READ_NAME_REGEX
for an explanation on how the read name is parsed.
-
ReadNameParser
public ReadNameParser(String readNameRegex)
Creates are read name parser using the given read name regex. SeeDEFAULT_READ_NAME_REGEX
for an explanation on how to format the regular expression (regex) string.- Parameters:
readNameRegex
- the read name regular expression string to parse read names, null to never parse location information.
-
ReadNameParser
public ReadNameParser(String readNameRegex, htsjdk.samtools.util.Log log)
Creates are read name parser using the given read name regex. SeeDEFAULT_READ_NAME_REGEX
for an explanation on how to format the regular expression (regex) string.- Parameters:
readNameRegex
- the read name regular expression string to parse read names, null to never parse location information..log
- the log to which to write messages.
-
-
Method Detail
-
addLocationInformation
public boolean addLocationInformation(String readName, PhysicalLocation loc)
Method used to extract tile/x/y from the read name and add it to the PhysicalLocationShort so that it can be used later to determine optical duplication- Parameters:
readName
- the name of the read/clusterloc
- the object to add tile/x/y to- Returns:
- true if the read name contained the information in parsable form, false otherwise
-
getLastThreeFields
public static int getLastThreeFields(String readName, char delim, int[] tokens) throws NumberFormatException
Given a string, splits the string by the delimiter, and returns the the last three fields parsed as integers. Parsing a field considers only a sequence of digits up until the first non-digit character. The three values are stored in the passed-in array.- Throws:
NumberFormatException
- if any of the tokens that should contain numbers do not start with parsable numbers
-
rapidParseInt
public static int rapidParseInt(String input) throws NumberFormatException
Very specialized method to rapidly parse a sequence of digits from a String up until the first non-digit character.- Throws:
NumberFormatException
- if the String does not start with an optional - followed by at least on digit
-
-