Class IlluminaBasecallsConverter<CLUSTER_OUTPUT_RECORD>

  • Type Parameters:
    CLUSTER_OUTPUT_RECORD - The class to which a ClusterData is converted in preparation for writing.

    public class IlluminaBasecallsConverter<CLUSTER_OUTPUT_RECORD>
    extends Object
    Manages the conversion of Illumina basecalls into some output format. Creates multiple threads to manage reading, sorting and writing efficiently. Output is written in query name output. Optionally demultiplexes indexed reads into separate outputs by barcode.
    • Constructor Detail

      • IlluminaBasecallsConverter

        public IlluminaBasecallsConverter​(File basecallsDir,
                                          int lane,
                                          ReadStructure readStructure,
                                          Map<String,​? extends picard.illumina.BasecallsConverter.ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
                                          boolean demultiplex,
                                          int maxReadsInRamPerTile,
                                          List<File> tmpDirs,
                                          int numProcessors,
                                          boolean forceGc,
                                          Integer firstTile,
                                          Integer tileLimit,
                                          Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator,
                                          htsjdk.samtools.util.SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype,
                                          Class<CLUSTER_OUTPUT_RECORD> outputRecordClass,
                                          BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
                                          boolean applyEamssFiltering,
                                          boolean includeNonPfReads,
                                          boolean ignoreUnexpectedBarcodes)
        Parameters:
        basecallsDir - Where to read basecalls from.
        lane - What lane to process.
        readStructure - How to interpret each cluster.
        barcodeRecordWriterMap - Map from barcode to CLUSTER_OUTPUT_RECORD writer. If demultiplex is false, must contain one writer stored with key=null.
        demultiplex - If true, output is split by barcode, otherwise all are written to the same output stream.
        maxReadsInRamPerTile - Configures number of reads each tile will store in RAM before spilling to disk.
        tmpDirs - For SortingCollection spilling.
        numProcessors - Controls number of threads. If <= 0, the number of threads allocated is available cores - numProcessors.
        forceGc - Force explicit GC periodically. This is good for causing memory maps to be released.
        firstTile - (For debugging) If non-null, start processing at this tile.
        tileLimit - (For debugging) If non-null, process no more than this many tiles.
        outputRecordComparator - For sorting output records within a single tile.
        codecPrototype - For spilling output records to disk.
        outputRecordClass - Inconveniently needed to create SortingCollections.
        includeNonPfReads - If true, will include ALL reads (including those which do not have PF set)
        ignoreUnexpectedBarcodes - If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap, otherwise will throw an exception
      • IlluminaBasecallsConverter

        public IlluminaBasecallsConverter​(File basecallsDir,
                                          File barcodesDir,
                                          int lane,
                                          ReadStructure readStructure,
                                          Map<String,​? extends picard.illumina.BasecallsConverter.ConvertedClusterDataWriter<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
                                          boolean demultiplex,
                                          int maxReadsInRamPerTile,
                                          List<File> tmpDirs,
                                          int numProcessors,
                                          boolean forceGc,
                                          Integer firstTile,
                                          Integer tileLimit,
                                          Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator,
                                          htsjdk.samtools.util.SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype,
                                          Class<CLUSTER_OUTPUT_RECORD> outputRecordClass,
                                          BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
                                          boolean applyEamssFiltering,
                                          boolean includeNonPfReads,
                                          boolean ignoreUnexpectedBarcodes)
        Parameters:
        basecallsDir - Where to read basecalls from.
        barcodesDir - Where to read barcodes from (optional; use basecallsDir if not specified).
        lane - What lane to process.
        readStructure - How to interpret each cluster.
        barcodeRecordWriterMap - Map from barcode to CLUSTER_OUTPUT_RECORD writer. If demultiplex is false, must contain one writer stored with key=null.
        demultiplex - If true, output is split by barcode, otherwise all are written to the same output stream.
        maxReadsInRamPerTile - Configures number of reads each tile will store in RAM before spilling to disk.
        tmpDirs - For SortingCollection spilling.
        numProcessors - Controls number of threads. If <= 0, the number of threads allocated is available cores - numProcessors.
        forceGc - Force explicit GC periodically. This is good for causing memory maps to be released.
        firstTile - (For debugging) If non-null, start processing at this tile.
        tileLimit - (For debugging) If non-null, process no more than this many tiles.
        outputRecordComparator - For sorting output records within a single tile.
        codecPrototype - For spilling output records to disk.
        outputRecordClass - Inconveniently needed to create SortingCollections.
        includeNonPfReads - If true, will include ALL reads (including those which do not have PF set)
        ignoreUnexpectedBarcodes - If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap, otherwise will throw an exception
    • Method Detail

      • doTileProcessing

        public void doTileProcessing()
        Do the work, i.e. create a bunch of threads to read, sort and write. setConverter() must be called before calling this method.