Package picard.sam.markduplicates.util
Class OpticalDuplicateFinder
- java.lang.Object
-
- picard.sam.util.ReadNameParser
-
- picard.sam.markduplicates.util.OpticalDuplicateFinder
-
- All Implemented Interfaces:
Serializable
public class OpticalDuplicateFinder extends ReadNameParser implements Serializable
Contains methods for finding optical/co-localized/sequencing duplicates.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BIG_DUPLICATE_SET_SIZE
static int
DEFAULT_MAX_DUPLICATE_SET_SIZE
static int
DEFAULT_OPTICAL_DUPLICATE_DISTANCE
int
opticalDuplicatePixelDistance
-
Fields inherited from class picard.sam.util.ReadNameParser
DEFAULT_READ_NAME_REGEX, readNameRegex
-
-
Constructor Summary
Constructors Constructor Description OpticalDuplicateFinder()
Uses the default duplicate distanceDEFAULT_OPTICAL_DUPLICATE_DISTANCE
(100) and the default read name regexReadNameParser.DEFAULT_READ_NAME_REGEX
.OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, long maxDuplicateSetSize, htsjdk.samtools.util.Log log)
OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, htsjdk.samtools.util.Log log)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean[]
findOpticalDuplicates(List<? extends PhysicalLocation> list, PhysicalLocation keeper)
Finds which reads within the list of duplicates that are likely to be optical/co-localized duplicates of one another.void
setBigDuplicateSetSize(int bigDuplicateSetSize)
Sets the size of a set that is big enough to log progress about.void
setMaxDuplicateSetSize(long maxDuplicateSetSize)
Sets the size of a set that is too big to process.-
Methods inherited from class picard.sam.util.ReadNameParser
addLocationInformation, getLastThreeFields, rapidParseInt
-
-
-
-
Field Detail
-
opticalDuplicatePixelDistance
public int opticalDuplicatePixelDistance
-
DEFAULT_OPTICAL_DUPLICATE_DISTANCE
public static final int DEFAULT_OPTICAL_DUPLICATE_DISTANCE
- See Also:
- Constant Field Values
-
DEFAULT_BIG_DUPLICATE_SET_SIZE
public static final int DEFAULT_BIG_DUPLICATE_SET_SIZE
- See Also:
- Constant Field Values
-
DEFAULT_MAX_DUPLICATE_SET_SIZE
public static final int DEFAULT_MAX_DUPLICATE_SET_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
OpticalDuplicateFinder
public OpticalDuplicateFinder()
Uses the default duplicate distanceDEFAULT_OPTICAL_DUPLICATE_DISTANCE
(100) and the default read name regexReadNameParser.DEFAULT_READ_NAME_REGEX
.
-
OpticalDuplicateFinder
public OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, htsjdk.samtools.util.Log log)
- Parameters:
readNameRegex
- seeReadNameParser.DEFAULT_READ_NAME_REGEX
.opticalDuplicatePixelDistance
- the optical duplicate pixel distancelog
- the log to which to write messages.
-
OpticalDuplicateFinder
public OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, long maxDuplicateSetSize, htsjdk.samtools.util.Log log)
- Parameters:
readNameRegex
- seeReadNameParser.DEFAULT_READ_NAME_REGEX
.opticalDuplicatePixelDistance
- the optical duplicate pixel distancemaxDuplicateSetSize
- the size of a set that is too big enough to processlog
- the log to which to write messages.
-
-
Method Detail
-
setBigDuplicateSetSize
public void setBigDuplicateSetSize(int bigDuplicateSetSize)
Sets the size of a set that is big enough to log progress about. Defaults to 1000- Parameters:
bigDuplicateSetSize
- the size of a set that is big enough to log progress about
-
setMaxDuplicateSetSize
public void setMaxDuplicateSetSize(long maxDuplicateSetSize)
Sets the size of a set that is too big to process. Defaults to 300000- Parameters:
maxDuplicateSetSize
- the size of a set that is too big enough to process
-
findOpticalDuplicates
public boolean[] findOpticalDuplicates(List<? extends PhysicalLocation> list, PhysicalLocation keeper)
Finds which reads within the list of duplicates that are likely to be optical/co-localized duplicates of one another. Within each cluster of optical duplicates that is found, one read remains un-flagged for optical duplication and the rest are flagged as optical duplicates. The set of reads that are considered optical duplicates are indicated by returning "true" at the same index in the resulting boolean[] as the read appeared in the input list of physical locations.- Parameters:
list
- a list of reads that are determined to be duplicates of one anotherkeeper
- a single PhysicalLocation that is the one being kept as non-duplicate, and thus should never be annotated as an optical duplicate. May in some cases be null, or a PhysicalLocation not contained within the list!- Returns:
- a boolean[] of the same length as the incoming list marking which reads are optical duplicates
-
-