Changelog
Unreleased
1.7.0 | 2024-10-3
Docker: added new feature (Issue #254). Added the ability to run PepSIRF as a Docker image and added a page for instructions.
CMakelists: bug fix (Issue #197). Resolved CMake not locating OpenMP on MacOS. Tutorial for fix added to installation page.
Subjoin: added new feature (Issue #236). Added a functionality to the “-i” option in Subjoin to accept a regex pattern instead of a filename which contains sample/peptide names. The sample/peptide names used from the score matrix file will be filtered by whether they contain the regex pattern.
Demux: added new feature (Issue #234). Added “–unmapped-reads-output” option to Demux, which writes all reads that have not been mapped to a sample/peptide to the specified filename.
Deconv: added new feature (Issue #233). Changed Deconv “-t” option to accept a tab demilited file with a column for each TaxID and a column for the score threshold to use for that TaxID. The originally functionality still holds: if a number to included with option, each TaxID will use that score threshold.
Demux: added new feature (Issue #227). Demux outputs additional information about the total number of samples, the number of samples containing a given number of replicates, and the number of samples starting with “Sblk_”. The replicate information with be written to the file provided with the option “–replicate_info”.
Subjoin: added new feature (Issue #223). Added “–exclude” option to subjoin that changes the output data file to contain all of the input samples/peptides except the the ones specified by the user.
Demux: added new feature (Issue #221). Demux automatically truncates sequences in the library which are longer the than provided length through the “–seq” option. If a sequence is found to be shorter than the specified length, an error is thrown.
Deconv: added new feature (Issue #218). Added “–custom_id_name_map_info” option to Deconv which accepts a filename, the key column header, and the value column header in the file to use to link TaxIDs to taxon names. This option should be used instead of “–id_name_map” if the user wishes to define a tab-delimited ID name map.
Link: added new feature (Issue #210). Fixes crash in Link when a species does not have an associated ID. A single warning is logged which informs the user some species have not been considered and where to find a list of those species which should be reviewed.
Test: added new feature (Issue #152). Automated tests have been added and finished to test all recently added features and fixed issues in PepSIRF.
Enrich: added new feature (Issue #131). Provides more information in Enrich’s failed enrichment output. Sample replicates which do not meet either threshold are identified in the output and are marked as either not meeting the minimum or maximum threshold.
Demux: added new feature (Issue #56). Alters behavior of Demux when ran in reference independent mode. In ref-independent mode, index toggling is turned off; therefore, if an exact match at the given index is not found, the read is discarded.
Logger: added new feature (Issue #2). Adds a system to handle logging PepSIRF’s progress when running. A default file name is automatically generated with the module name, current time and date. An option ‘–logfile’ which allows the user to provide a custom name for the log file.
Deconv: added new feature (Issue #36). Standardizes the order tied species are listed in Deconv output. If species names are provided, then the tied species are sorted by alphabeticall by their names; otherwise, they are sorted by their species ID.
1.6.0 | 2023-04-04
Version 1.6.0 adds several new features.
New Features:
Demux: added new feature (Issue #169). Added an option for FASTQ - level outputs to be generated by demux. This is done with the flag “-q” followed by a directory path where files will be generated.
Enrich: added new feature (Issue #178). In the case of a sample not having enriched peptides, enrich will now add a space to the empty file. This allows for better compatability with deconv through Qiime2.
Enrich: added new feature (Issue #137). Added an option for enrich to drop replicates with low raw read counts. This is done with the flag “-l” or “–low_raw_reads”. If this functionality is invoked, dropped replicates will not be considered in the enrichment process, and the dropped replicates will be reported in the enrichment failure reasons file under “Removed Replicates”: each line will contain the replicates removed from a sample.
Enrich: added new feature (Issue #131). Enrich now reports which replicates caused a raw read count threshold failure; and identifies if a replicate failed the maximum or minimum threshold.
Deconv: added new feature (Issue #161). Added a flag to deconv that allows the user to specify what string is expected at the end of each file containing enriched peptides (set to “_enriched.txt” by default). If a file without does not end in the string that was specified, deconv skips over that file.
Info: added new feature (Issue #149). Added feature to info that generates a matrix of average counts given replicates. Two new flags must be included in order to use this feature: –rep_names and –get_avgs. –rep_names requires an input file with the names of the replicates that the user wants to generate a matrix of average counts for. –get_avgs requires and output file name where the matrix will be stored.
1.5.1 | 2022-09-10
Version 1.5.1 fixes a bug and adds a feature.
New Features:
Enrich: added new feature (Issue #154). Altered behavior of enrich to produce blank sample file output for samples that failed enrichment.
Bug Fixes:
Demux: bug fix (Issue #168).fixed bug introduced in release 1.5, where amino acid level output is overwritten with peptide level output. This no longer occurs.
1.5.0 | 2022-06-02
Version 1.5.0 adds multiple features and removed OMP support for Clang compilation.
New Features:
Demux: added new feature (Issue #35). If samplenames or index name sets have duplicates in samplelist file, then those duplicates will be output to the terminal.
Demux: added new feature (Issue #57). Demux now has an additional option for providing a tab-delimited file with 5 ordered columns: 1) index name, which should correspond to a header name in the sample sheet, 2) read name, which should be either “r1” or “r2” to specify whether the index is in “–input_r1” or “–input_r2”, 3) index start location (0-based, inclusive), 4) index length and 5) number of mismatched to allow. Note: the last three columns correspond to the info currently provided on the command line with “–f_index” and “–r_index” (or “–index1” and “–index2”, with recent changes). With this feature, the demux module can now analyze an arbitrary amount of indexes to be found in r1 or r2 input sequences.
Demux: added new feature (Issue #57). Demux output diagnostics may now provide more index matches for flexibility with demux changes in #57.
Demux: added new feature (Issue #138). Demux now automatically removes reference duplicates when running in a reference dependent mode.
Zscore: added new feature (Issue #105). A check is added that verifys the bins provided to the Z score module. It is no longer possible to run the Z score module with the wrong set of bins.
CMakelists: recognized issue with clang (Issue #162). Removed threading support on MacOS.
Bug Fixes:
Demux: added bug fix (Issue #156). Solved memory race condition in demux created during development of this release.
Demux: added bug fix (Issue #163). Solved memory race condition in demux that created incorrect counts.
1.4.0 | 2021-07-09
Version 1.4.0 adds multiple features and one bug fix for s_enrich, p_enrich, and link. CMakelists has been updated and a new module ‘enrich’ has been introduced.
New Features:
Module added: enrich (Issue #114). The p_enrich module was altered to allow for flexibility in the number of replicates for each sample and renamed ‘enrich’. This new module can now provide the functionality of both s_enrich and p_enrich, and therefore, these two modules will no longer be available. Additionally, this module is able to handle >2 replicates.
Enrich: new optional output file (Issue #103). An optional flag (-f, –enrichment_failure_reason) is now available. If used, a .tsv file will be generated to document each sample for which an enriched peptide file was NOT generated, as well as the reason why.
CMakelists: Big Sur support (Issue #117). ‘-Xpreprocessor’ has been added to the command setting CMake C++ flags in order to support compilation on Mac OS Big Sur.
Bug Fixes:
Link: Issue #116. A vague and system-dependent error occurred when –protein_file sequence names were not found in the –meta file. Modifications have been made to properly handle this situation and provide a clear and consistent error message.
1.3.7 | 2021-06-28
Version 1.3.7 adds one feature and one bug fix to norm.
Bug Fixes:
Norm: Issue #125. When using a separate negative control matrix (–negative_control) for diff, ratio or diffratio normalizations, previous versions assumed that the order of the peptides (rows) was identical to the order in the primary data matrix (-p, –peptide_scores). This has been changed to properly account for any order in both rows and columns.
Norm: Issue #104. The norm module help message for option (–peptide_score, -p) has been updated.
1.3.6 | 2021-06-09
Version 1.3.6 adds several features and fixes several issues in demux, zscore, and subjoin.
New Features:
Demux: new warning (Issue #96). The module now includes a warning for the user when index names from the (–samplelist, -s) file are not included in the index fasta file (–index, -i).
Zscore: new error handling (Issue #97). The (–bins, -b) file is now verified to be of the correct type. If the file is not of the correct format, the user will now receive an easily understandable error message.
Bug Fixes:
Subjoin: Renaming bug (Issue #99). There was a bug related to the subjoin module renaming feature, which, under certain circumstances, resulted in the wrong samples being included in the output matrix. A fix has been made to prevent this feature from causing errors in subjoin.
Demux: Incorrect diagnostics calculations (Issue #100). The diagnostic features added in v1.3.2 miscalculated the number of index matches due to the module using every barcode contained in (-i, –index) for evaluation. Barcodes in the index file are now only used in the analysis if they are present in the (-s, –samplelist) file.
1.3.5 | 2021-03-02
Norm: Incorrect handling of input and output data (PR #93). This version changes how the negative control arguments are checked so that one of these flags (–negative_names) or (–negative_id) is only required when using the diff, ratio, or diff-ratio approaches.
Norm: Added missing transposing and updated approach using negative control matrix (PR #93). In v1.3.4, the optional negative control matrix (–negative_control) was not transposed before being accessed in calculations. By not including this, segfaults or incorrect averages were likely to occur.
P_enrich: Raw score threshold causing segfault (PR #94). In v1.3.4, the p_enrich module does not properly initialize the raw score threshold. The raw score thresholds no longer default to a container of 2 values both set to ‘0.0’. Raw score threshold will only contain the provided values through the (–raw_score_constraint) option for p_enrich and (–threshhold_file, -t) option.
1.3.4 | 2020-12-28
New Features:
Demux: new sample list format (Issue #89). The demux module’s (–samplelist, -s) file must now include a header row and may optionally include more than the required 3 columns used for demultiplexing. In v1.3.4, the previous, header-less (–samplelist, -s) file format has been deprecated. The columns needed for demultiplexing are now identified by header names and therefore, order of these columns are no longer important. Header names can be specified using 3 new arguments: (–sname), (–sindex1) and (–sindex2).
Norm: new control-based approaches (Issue # 82). Three new normalization approaches were added( diff, ratio, and diff-ratio), all of which utilize column-sum normalized negative control values. To use these methods, the user must provide one of these two new arguments: (-s, –negative_id) or (-n, –negative_names). . If both are provided, then the (-n, –negative_names) will be used. A new optional argument (–negative_control) was added to allow the user to specify an independent data matrix containing the negative controls.
P_enrich: increased threshold flexibility (Issue #84). Modified the way peptide-level thresholds are provided. Dedicated command line flags have been deprecated (–zscores, –zscore_constraint, –norm_scores, –norm_score_constraint). Data files and threshold values are now all specified using a single tab-delimited file (–threshold_file), which has the following format per row: a data matrix filename and either a single threshold or a comma-delimited pair of thresholds. This file should contain one row per data matrix, and any number of data matrices can be included.
1.3.3 | 2020-10-16
New Features:
Subjoin: Adding second option for providing input data (PR #86). Two separate options have replaced the single filter scores option as input approaches. (–multi_file, -m) is for providing multiple input data matrices and name lists in a single tab-delimited file, while (–input, -i) is for providing a single data matrix and name list pair directly on the command line.
Bug Fixes:
Subjoin: Remove “Sequence Names” from input matrix datasets(Issue #83). Subjoin no longer reads the first column of the first row when validating the sample names in the header row.
1.3.2 | 2020-09-20
Version 1.3.2 added features to demux and zscore as well as fixed one bug with compilation of a package for PepSIRF.
New Features:
Demux: Additional diagnostic information (Issue #34). The demux module now features additional output to aid in output diagnostics. The percentage of index 1 matches, index 2 matches, and the variable region matches are output at the end of the run to the terminal.
Demux: Additional option for diagnostic output (Issue #34). The (–diagnostic_info) option accepts a filename to output diagnostic information to. The output contains sample names in the first column, and the following columns contain the index1, index2 and variable region matches.
Zscore: Additional feature for calculating z score (Issue #78). A highest density interval approach is now a feature of the z_score module.
Bug Fixes:
PepSIRF: Fixing compatibility issue between ZLIB and Mac OS (Issue #72). ZLIB is now functional on both Mac and Linux systems as of this update.
1.3.1 | 2020-07-09
Version 1.3.1 fixes an issue with s_enrich and disabled a nonfunctioning utility for Mac operating systems used in PepSIRF.
Bug Fixes:
S_enrich: Fixing data element mismatches during analysis (Issue #73). The sample names in the output of s_enrich were being switched around. The sample names are no longer mismatched with their data.
PepSIRF: Disabling ZLIB feature for Mac users (Issue #72). When attempting to compile PepSIRF with this ZLIB on Mac systems, a compilation error would occur. This feature will be temporarily disabled for Mac users until a fix is found.
1.3.0 | 2020-06-22
Version 1.3.0 adds features to deconv, demux, info, link, p_enrich, and subjoin. The help info was updated for multiple modules. It also includes bug fixes to bin, s_enrich, and the PepSIRF testing executable.
New Features:
Deconv: Standardized format of “–linked” file for input (Issue #64). The Deconv module now includes a required option (-l,–linked). This is a file name with the file formatted in the output provided by the Link module.
Deconv: Simplified scoring method selection (Issue #61). The Deconv module now uses a single scoring strategy option. The strategies will no longer be separate options; now the new option (–scoring_strategy) accepts one of three string inputs: ‘summation’, ‘integer’, or ‘fraction’. By default, ‘summation’ is used.
Demux: Providing error messages (Issue #30). When parsing samples or fasta files in demux, there will now be a runtime error thrown when the file fails to be opened properly.
Info: Increasing significant digits (Issue #51). The output from the info module is now in fixed point notation with set precision at 2 decimal places.
Link: Switch source of taxonomic info to metadata file (Issue #50). The link module now uses a single metadata file to obtain taxonomic information for use in the module. One column for the protein sequence names and one column for the metadata to be used in generating the linkage map. Multiple columns of metadata can exist in the map, but only one ID column can be used.
P_enrich: Differentiate two “-s” flags (Issue #55). The single hyphen flag (-s) for outfile suffix option (–outfile_suffix) and the option (–samples) have been disambiguated by distinguishing the (-s) flag (–samples, -s) and (–outfile_suffix, -x).
Subjoin: Allow subjoin without a namelist (Issue #26). The subjoin module no longer requires a name list of sample names to be included with the score matrix for the (–filter_scores) option.
PepSIRF: Update to help info (Issue #53). The help info for the bin, deconv, demux, info, link, norm, p_enrich, s_enrich, subjoin, and z_score modules provided by (-h) have all been refactored to improve readability and accuracy.
PepSIRF: “pepsirf_test” executable does not compile (Issue #47). Unit tests in the testing build have been updated to reflect the changes made in the various modules for this update.
Bug Fixes:
Bin: Last bin is smaller than specified “–bin_size” (Issue #58). When there is not a large number of peptides with zero count, the last bin can result in fewer than the number of peptides specified with the (–bin_size) option. The bin size increases by one until the smallest bin is equal to the size of the minimum bin size.
S_enrich: Minimum z score not properly set (Issue #54). The s_enrich module in previous versions was sharing the same minimum threshold provided for zscore for both the norm and zscore thresholds. This has been fixed so the thresholds are correctly used based on what is provided by the min score options.