Docstring:
Usage: qiime pepsirf demux [OPTIONS]
takes the following parameters and outputs counts for each reference
sequence (i.e. probe/peptide) for each sample with pepsirf's demux module
(MUST precompile pepsirf's develop branch to run this module)
Inputs:
--i-input-r1 ARTIFACT Fastq-formatted file containing reads with DNA
DemuxFastq tags. If PepSIRF was NOT compiled with Zlib support,
this file must be uncompressed. If PepSIRF was
compiled with Zlib support, then this file can be
uncompressed or compressed using gzip. In this case,
the file format will be automatically determined.
[required]
--i-index ARTIFACT Name of fasta-formatted file containing forward and
DemuxIndex (potentially) reverse index sequences. Sequence
names must match exactly with those supplied in the
'samplelist'. [required]
--i-samplelist ARTIFACT A tab-delimited list of samples with a header row
DemuxSampleList and one sample per line. This file must contain at
least one index column and one sample name column.
Multiple index columns may be included. This file
can also include additional columns that will not be
used for the demultiplexing. Specify which columns
to use with the '--sname', '--sindex1', and
'--sindex2' flags. If '-fif' is used, then only
'-sname' will be used. [required]
--i-input-r2 ARTIFACT Optional index-only fastq file. If PepSIRF was NOT
DemuxFastq compiled with Zlib support, this file must be
uncompressed. If PepSIRF was compiled with Zlib
support, then this file can be uncompressed or
compressed using gzip. In this case, the file format
will be automatically determined. Note that if this
argument is not supplied, only 'index1' will be used
to identify samples. [optional]
--i-fif ARTIFACT The flexible index file can be provided as an
DemuxFif alternative to the '--index1' and '--index2'
options. The file must use the following format: a
tab-delimited file with 5 ordered columns: 1) index
name, which should correspond to a header name in
the sample sheet, 2) read name, which should be
either 'r1' or 'r2' (not case-sensitive) to specify
whether the index is in '--input-r1' or
'--input-r2', 3) index start location (0-based,
inclusive), 4) index length and 5) number of
mismatched to allow. '--index1', '--index2',
'--sname', '--sindex1', and 'sindex2' will be
ignored if this option is provided. [optional]
--i-library ARTIFACT Fasta-formatted file containing reference DNA tags.
DemuxLibrary If this flag is not included, reference-independent
demultiplexing will be performed. In
reference-independent mode, each sequence in the
region specified by '--seq' will be considered its
own reference, and the observed sequences will be
used as the row names in the output count matrix.
[optional]
Parameters:
--p-seq TEXT Positional information for the DNA tags. This
argument must be passed in the same format specified
for 'index1'. [required]
--p-read-per-loop INTEGER
The number of fastq records read a time. A higher
value will result in more memory usage by the
program, but will also result in fewer disk
accesses, increasing performance of the program.
[default: 100000]
--p-num-threads INTEGER Number of threads to use for analyses. [default: 2]
--p-phred-base INTEGER Phred base to use when parsing fastq quality
scores. Valid options include 33 or 64.
[default: 33]
--p-phred-min-score INTEGER
The minimum average phred-scaled quality score for
the DNA tag portion of a read for it to be
considered for matching. This means that if the
average phred33/64 score for a read at the expected
locations of the DNA tag is not at least this then
the read will be discarded. [default: 0]
--p-sindex TEXT Used to specify the header for the index 1 and
optional index 2 column in the samplelist. This is
an alternative to using the '--fif'' option.
[optional]
--p-translate-aggregates / --p-no-translate-aggregates
Include this flag to use translation-based
aggregation. In this mode, counts for nt sequences
will be combined if they translate into the same aa
sequence. Note: When this mode is used, the name of
the aggregate sequence will be the sequence that was
a result of the translation. Therefore, this mode is
most appropriate for use with reference-independent
demultiplexing. [default: False]
--p-concatemer / --p-no-concatemer
Concatenated adapter/primer sequences (optional).
The presence of this sequence within a read
indicates that the expected DNA tag is not present.
If supplied, the number of times this concatemer is
recorded in the input file is reported.
[default: False]
--p-sname TEXT Used to specify the header for the sample name
column in the samplelist. By default 'SampleName' is
set as the column header name.
[default: 'SampleName']
--p-index1 TEXT Positional information for index1 (i.e barcode 1).
This argument must be passed as 3 comma-separated
values. The first item represents the (0-based)
expected start position of the first index; the
second represents the length of the first index; and
the third represents the number of mismatches that
are tolerated for this index. An example is
'--index1 12,12,1'. This says that the index starts
at (0-based) position 12, the index is 12
nucleotides long, and if a perfect match is not
found, then up to one mismatch will be tolerated.
[optional]
--p-index2 TEXT Positional information for index2, optional. This
argument must be passed in the same format specified
for '--index1'. If '--input2' is provided, this
positional information is assummed to refer to the
reads contained in this second, index-only fastq
file. If '--input-r2' is NOT provided, this
positional information is assumed to refer to the
reads contained in the '--input-r1' fastq file.
[default: '0,0,0']
--p-outfile TEXT The outfile that will produce a list of inputs to
PepSIRF. [default: './deconv.out']
--p-pepsirf-binary TEXT The binary to call pepsirf on your system.
[default: 'pepsirf']
Outputs:
--o-raw-counts-output ARTIFACT FeatureTable[RawCounts]
[required]
--o-diagnostic-output ARTIFACT
DemuxDiagnostic [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--help Show this message and exit.
Import:
from qiime2.plugins.pepsirf.methods import demux
Docstring:
pepsirf demux module
takes the following parameters and outputs counts for each reference
sequence (i.e. probe/peptide) for each sample with pepsirf's demux module
(MUST precompile pepsirf's develop branch to run this module)
Parameters
----------
input_r1 : DemuxFastq
Fastq-formatted file containing reads with DNA tags. If PepSIRF was
NOT compiled with Zlib support, this file must be uncompressed. If
PepSIRF was compiled with Zlib support, then this file can be
uncompressed or compressed using gzip. In this case, the file format
will be automatically determined.
index : DemuxIndex
Name of fasta-formatted file containing forward and (potentially)
reverse index sequences. Sequence names must match exactly with those
supplied in the 'samplelist'.
samplelist : DemuxSampleList
A tab-delimited list of samples with a header row and one sample per
line. This file must contain at least one index column and one sample
name column. Multiple index columns may be included. This file can also
include additional columns that will not be used for the
demultiplexing. Specify which columns to use with the '--sname', '--
sindex1', and '--sindex2' flags. If '-fif' is used, then only '-sname'
will be used.
seq : Str
Positional information for the DNA tags. This argument must be passed
in the same format specified for 'index1'.
input_r2 : DemuxFastq, optional
Optional index-only fastq file. If PepSIRF was NOT compiled with Zlib
support, this file must be uncompressed. If PepSIRF was compiled with
Zlib support, then this file can be uncompressed or compressed using
gzip. In this case, the file format will be automatically determined.
Note that if this argument is not supplied, only 'index1' will be used
to identify samples.
fif : DemuxFif, optional
The flexible index file can be provided as an alternative to the '--
index1' and '--index2' options. The file must use the following format:
a tab-delimited file with 5 ordered columns: 1) index name, which
should correspond to a header name in the sample sheet, 2) read name,
which should be either 'r1' or 'r2' (not case-sensitive) to specify
whether the index is in '--input_r1' or '--input_r2', 3) index start
location (0-based, inclusive), 4) index length and 5) number of
mismatched to allow. '--index1', '--index2', '--sname', '--sindex1',
and 'sindex2' will be ignored if this option is provided.
library : DemuxLibrary, optional
Fasta-formatted file containing reference DNA tags. If this flag is not
included, reference-independent demultiplexing will be performed. In
reference-independent mode, each sequence in the region specified by '
--seq' will be considered its own reference, and the observed sequences
will be used as the row names in the output count matrix.
read_per_loop : Int, optional
The number of fastq records read a time. A higher value will result in
more memory usage by the program, but will also result in fewer disk
accesses, increasing performance of the program.
num_threads : Int, optional
Number of threads to use for analyses.
phred_base : Int, optional
Phred base to use when parsing fastq quality scores. Valid options
include 33 or 64.
phred_min_score : Int, optional
The minimum average phred-scaled quality score for the DNA tag portion
of a read for it to be considered for matching. This means that if the
average phred33/64 score for a read at the expected locations of the
DNA tag is not at least this then the read will be discarded.
sindex : Str, optional
Used to specify the header for the index 1 and optional index 2 column
in the samplelist. This is an alternative to using the '--fif'' option.
translate_aggregates : Bool, optional
Include this flag to use translation-based aggregation. In this mode,
counts for nt sequences will be combined if they translate into the
same aa sequence. Note: When this mode is used, the name of the
aggregate sequence will be the sequence that was a result of the
translation. Therefore, this mode is most appropriate for use with
reference-independent demultiplexing.
concatemer : Bool, optional
Concatenated adapter/primer sequences (optional). The presence of this
sequence within a read indicates that the expected DNA tag is not
present. If supplied, the number of times this concatemer is recorded
in the input file is reported.
sname : Str, optional
Used to specify the header for the sample name column in the
samplelist. By default 'SampleName' is set as the column header name.
index1 : Str, optional
Positional information for index1 (i.e barcode 1). This argument must
be passed as 3 comma-separated values. The first item represents the
(0-based) expected start position of the first index; the second
represents the length of the first index; and the third represents the
number of mismatches that are tolerated for this index. An example is '
--index1 12,12,1'. This says that the index starts at (0-based)
position 12, the index is 12 nucleotides long, and if a perfect match
is not found, then up to one mismatch will be tolerated.
index2 : Str, optional
Positional information for index2, optional. This argument must be
passed in the same format specified for '--index1'. If '--input2' is
provided, this positional information is assummed to refer to the reads
contained in this second, index-only fastq file. If '--input_r2' is NOT
provided, this positional information is assumed to refer to the reads
contained in the '--input_r1' fastq file.
outfile : Str, optional
The outfile that will produce a list of inputs to PepSIRF.
pepsirf_binary : Str, optional
The binary to call pepsirf on your system.
Returns
-------
raw_counts_output : FeatureTable[RawCounts]
diagnostic_output : DemuxDiagnostic