demux: pepsirf demux module¶

Command line interface
Artifact API

Docstring:

Usage: qiime pepsirf demux [OPTIONS]

  takes the following parameters and outputs counts for each reference
  sequence (i.e. probe/peptide) for each sample with pepsirf's demux module
  (MUST precompile pepsirf's develop branch to run this module)

Inputs:
  --i-input-r1 ARTIFACT   Fastq-formatted file containing reads with DNA
    DemuxFastq            tags. If PepSIRF was NOT compiled with Zlib support,
                          this file must be uncompressed. If PepSIRF was
                          compiled with Zlib support, then this file can be
                          uncompressed or compressed using gzip. In this case,
                          the file format will be automatically determined.
                                                                    [required]
  --i-index ARTIFACT      Name of fasta-formatted file containing forward and
    DemuxIndex            (potentially) reverse index sequences. Sequence
                          names must match exactly with those supplied in the
                          'samplelist'.                             [required]
  --i-samplelist ARTIFACT A tab-delimited list of samples with a header row
    DemuxSampleList       and one sample per line. This file must contain at
                          least one index column and one sample name column.
                          Multiple index columns may be included. This file
                          can also include additional columns that will not be
                          used for the demultiplexing. Specify which columns
                          to use with the '--sname', '--sindex1', and
                          '--sindex2' flags. If '-fif' is used, then only
                          '-sname' will be used.                    [required]
  --i-input-r2 ARTIFACT   Optional index-only fastq file. If PepSIRF was NOT
    DemuxFastq            compiled with Zlib support, this file must be
                          uncompressed. If PepSIRF was compiled with Zlib
                          support, then this file can be uncompressed or
                          compressed using gzip. In this case, the file format
                          will be automatically determined. Note that if this
                          argument is not supplied, only 'index1' will be used
                          to identify samples.                      [optional]
  --i-fif ARTIFACT        The flexible index file can be provided as an
    DemuxFif              alternative to the '--index1' and '--index2'
                          options. The file must use the following format: a
                          tab-delimited file with 5 ordered columns: 1) index
                          name, which should correspond to a header name in
                          the sample sheet, 2) read name, which should be
                          either 'r1' or 'r2' (not case-sensitive) to specify
                          whether the index is in '--input-r1' or
                          '--input-r2', 3) index start location (0-based,
                          inclusive), 4) index length and 5) number of
                          mismatched to allow. '--index1', '--index2',
                          '--sname', '--sindex1', and 'sindex2' will be
                          ignored if this option is provided.       [optional]
  --i-library ARTIFACT    Fasta-formatted file containing reference DNA tags.
    DemuxLibrary          If this flag is not included, reference-independent
                          demultiplexing will be performed. In
                          reference-independent mode, each sequence in the
                          region specified by '--seq' will be considered its
                          own reference, and the observed sequences will be
                          used as the row names in the output count matrix.
                                                                    [optional]
Parameters:
  --p-seq TEXT            Positional information for the DNA tags. This
                          argument must be passed in the same format specified
                          for 'index1'.                             [required]
  --p-read-per-loop INTEGER
                          The number of fastq records read a time. A higher
                          value will result in more memory usage by the
                          program, but will also result in fewer disk
                          accesses, increasing performance of the program.
                                                             [default: 100000]
  --p-num-threads INTEGER Number of threads to use for analyses.  [default: 2]
  --p-phred-base INTEGER  Phred base to use when parsing fastq quality
                          scores. Valid options include 33 or 64.
                                                                 [default: 33]
  --p-phred-min-score INTEGER
                          The minimum average phred-scaled quality score for
                          the DNA tag portion of a read for it to be
                          considered for matching. This means that if the
                          average phred33/64 score for a read at the expected
                          locations of the DNA tag is not at least this then
                          the read will be discarded.             [default: 0]
  --p-sindex TEXT         Used to specify the header for the index 1 and
                          optional index 2 column in the samplelist. This is
                          an alternative to using the '--fif'' option.
                                                                    [optional]
  --p-translate-aggregates / --p-no-translate-aggregates
                          Include this flag to use translation-based
                          aggregation. In this mode, counts for nt sequences
                          will be combined if they translate into the same aa
                          sequence. Note: When this mode is used, the name of
                          the aggregate sequence will be the sequence that was
                          a result of the translation. Therefore, this mode is
                          most appropriate for use with reference-independent
                          demultiplexing.                     [default: False]
  --p-concatemer / --p-no-concatemer
                          Concatenated adapter/primer sequences (optional).
                          The presence of this sequence within a read
                          indicates that the expected DNA tag is not present.
                          If supplied, the number of times this concatemer is
                          recorded in the input file is reported.
                                                              [default: False]
  --p-sname TEXT          Used to specify the header for the sample name
                          column in the samplelist. By default 'SampleName' is
                          set as the column header name.
                                                       [default: 'SampleName']
  --p-index1 TEXT         Positional information for index1 (i.e barcode 1).
                          This argument must be passed as 3 comma-separated
                          values. The first item represents the (0-based)
                          expected start position of the first index; the
                          second represents the length of the first index; and
                          the third represents the number of mismatches that
                          are tolerated for this index. An example is
                          '--index1 12,12,1'. This says that the index starts
                          at (0-based) position 12, the index is 12
                          nucleotides long, and if a perfect match is not
                          found, then up to one mismatch will be tolerated.
                                                                    [optional]
  --p-index2 TEXT         Positional information for index2, optional. This
                          argument must be passed in the same format specified
                          for '--index1'. If '--input2' is provided, this
                          positional information is assummed to refer to the
                          reads contained in this second, index-only fastq
                          file. If '--input-r2' is NOT provided, this
                          positional information is assumed to refer to the
                          reads contained in the '--input-r1' fastq file.
                                                            [default: '0,0,0']
  --p-outfile TEXT        The outfile that will produce a list of inputs to
                          PepSIRF.                   [default: './deconv.out']
  --p-pepsirf-binary TEXT The binary to call pepsirf on your system.
                                                          [default: 'pepsirf']
Outputs:
  --o-raw-counts-output ARTIFACT FeatureTable[RawCounts]
                                                                    [required]
  --o-diagnostic-output ARTIFACT
    DemuxDiagnostic                                                 [required]
Miscellaneous:
  --output-dir PATH       Output unspecified results to a directory
  --verbose / --quiet     Display verbose output to stdout and/or stderr
                          during execution of this action. Or silence output
                          if execution is successful (silence is golden).
  --example-data PATH     Write example data and exit.
  --citations             Show citations and exit.
  --help                  Show this message and exit.

Import:

from qiime2.plugins.pepsirf.methods import demux

Docstring:

pepsirf demux module

takes the following parameters and outputs counts for each reference
sequence (i.e. probe/peptide) for each sample with pepsirf's demux module
(MUST precompile pepsirf's develop branch to run this module)

Parameters
----------
input_r1 : DemuxFastq
     Fastq-formatted file containing reads with DNA tags. If PepSIRF was
    NOT compiled with Zlib support, this file must be uncompressed. If
    PepSIRF was compiled with Zlib support, then this file can be
    uncompressed or compressed using gzip. In this case, the file format
    will be automatically determined.
index : DemuxIndex
    Name of fasta-formatted file containing forward and (potentially)
    reverse index sequences. Sequence names must match exactly with those
    supplied in the 'samplelist'.
samplelist : DemuxSampleList
    A tab-delimited list of samples with a header row and one sample per
    line. This file must contain at least one index column and one sample
    name column. Multiple index columns may be included. This file can also
    include additional columns that will not be used for the
    demultiplexing. Specify which columns to use with the '--sname', '--
    sindex1', and '--sindex2' flags. If '-fif' is used, then only '-sname'
    will be used.
seq : Str
    Positional information for the DNA tags. This argument must be passed
    in the same format specified for 'index1'.
input_r2 : DemuxFastq, optional
    Optional index-only fastq file. If PepSIRF was NOT compiled with Zlib
    support, this file must be uncompressed. If PepSIRF was compiled with
    Zlib support, then this file can be uncompressed or compressed using
    gzip. In this case, the file format will be automatically determined.
    Note that if this argument is not supplied, only 'index1' will be used
    to identify samples.
fif : DemuxFif, optional
    The flexible index file can be provided as an alternative to the '--
    index1' and '--index2' options. The file must use the following format:
    a tab-delimited file with 5 ordered columns: 1) index name, which
    should correspond to a header name in the sample sheet, 2) read name,
    which should be either 'r1' or 'r2' (not case-sensitive) to specify
    whether the index is in '--input_r1' or '--input_r2', 3) index start
    location (0-based, inclusive), 4) index length and 5) number of
    mismatched to allow. '--index1', '--index2', '--sname', '--sindex1',
    and 'sindex2' will be ignored if this option is provided.
library : DemuxLibrary, optional
    Fasta-formatted file containing reference DNA tags. If this flag is not
    included, reference-independent demultiplexing will be performed. In
    reference-independent mode, each sequence in the region specified by '
    --seq' will be considered its own reference, and the observed sequences
    will be used as the row names in the output count matrix.
read_per_loop : Int, optional
    The number of fastq records read a time. A higher value will result in
    more memory usage by the program, but will also result in fewer disk
    accesses, increasing performance of the program.
num_threads : Int, optional
    Number of threads to use for analyses.
phred_base : Int, optional
    Phred base to use when parsing fastq quality scores. Valid options
    include 33 or 64.
phred_min_score : Int, optional
    The minimum average phred-scaled quality score for the DNA tag portion
    of a read for it to be considered for matching. This means that if the
    average phred33/64 score for a read at the expected locations of the
    DNA tag is not at least this then the read will be discarded.
sindex : Str, optional
    Used to specify the header for the index 1 and optional index 2 column
    in the samplelist. This is an alternative to using the '--fif'' option.
translate_aggregates : Bool, optional
    Include this flag to use translation-based aggregation. In this mode,
    counts for nt sequences will be combined if they translate into the
    same aa sequence. Note: When this mode is used, the name of the
    aggregate sequence will be the sequence that was a result of the
    translation. Therefore, this mode is most appropriate for use with
    reference-independent demultiplexing.
concatemer : Bool, optional
    Concatenated adapter/primer sequences (optional). The presence of this
    sequence within a read indicates that the expected DNA tag is not
    present. If supplied, the number of times this concatemer is recorded
    in the input file is reported.
sname : Str, optional
    Used to specify the header for the sample name column in the
    samplelist. By default 'SampleName' is set as the column header name.
index1 : Str, optional
    Positional information for index1 (i.e barcode 1). This argument must
    be passed as 3 comma-separated values. The first item represents the
    (0-based) expected start position of the first index; the second
    represents the length of the first index; and the third represents the
    number of mismatches that are tolerated for this index. An example is '
    --index1 12,12,1'. This says that the index starts at (0-based)
    position 12, the index is 12 nucleotides long, and if a perfect match
    is not found, then up to one mismatch will be tolerated.
index2 : Str, optional
    Positional information for index2, optional. This argument must be
    passed in the same format specified for '--index1'. If '--input2' is
    provided, this positional information is assummed to refer to the reads
    contained in this second, index-only fastq file. If '--input_r2' is NOT
    provided, this positional information is assumed to refer to the reads
    contained in the '--input_r1' fastq file.
outfile : Str, optional
    The outfile that will produce a list of inputs to PepSIRF.
pepsirf_binary : Str, optional
    The binary to call pepsirf on your system.

Returns
-------
raw_counts_output : FeatureTable[RawCounts]
diagnostic_output : DemuxDiagnostic