View on GitHub

Library-Design

Peptide design and oligonucleotide encoding

Download this project as a .zip file

Set Cover

Overview

The Set Cover scripts can be used to run a set cover algorithm to design peptides for a given protein sequence fasta file, or a directory of protein sequence fasta files.

Two versions of this script are available: a Python version and a C version. The C version is recommended for large datasets.

Inputs

The only required input is a fasta-formatted file containing a set of target protein sequences.

Outputs

There is one required output, a fasta-formatted file containing the peptide sequences.

There is one optional output, a tab-delimited summary file, which shows the number of peptides designed for each input file (one line per input file).

Installation

Use

In this example, our input directory will contain ten files of aligned clusters created from a downsampled set of proteins from Poxviridae. (see full tutorial for creation of these files)

The output should contain ten files with 30 amino acid long peptides covering the input sequences along with a summary file showing the number of peptides designed for each cluster file. The ten peptide containing fasta files can then be concatenated into a single fasta file.

Command (Python version):

setCover.py \
-u poxviridae_id_70_SC_x9_y30_sumStats.tsv \
-x 9 \
-y 30 \
clusters/POX* 
- This real world example took <2 min to complete on a Macbook Pro laptop (Apple M1, macOS v11.6.4). 

Command (C version, using 2 threads):

Coming Soon!
- This real world example took ~# min to complete on a Macbook Pro laptop (Apple M1, macOS v11.6.4). 

The resulting peptide containing fasta files will need to be concatenated into one file containing all of the peptides.

Example Command (Linux): cat *.fasta > poxviridae_id70_all_SC-x9y30.fasta