PepSeqPred

Residue-level epitope prediction with reproducible evidence.

PepSeqPred predicts epitope masks for protein sequences and supports full developer workflows for preprocessing, training, evaluation, and HPC execution.

Python 3.12+ESM2 embeddingsDDP-ready trainingSeeded evaluation snapshots

PyPI GitHub Developer README

Model Comparison PhaseB Validation External Cocci

Install

pip install pepseqpred

Quickstart APIs

Pretrained API

from pepseqpred import load_pretrained_predictor

predictor = load_pretrained_predictor(
    model_id="default",
    device="auto"
)
result = predictor.predict_sequence(
    "ACDEFGHIKLMNPQRSTVWY",
    header="example_protein"
)
print(result.binary_mask)

Artifact-path API

from pepseqpred import load_predictor

predictor = load_predictor(
    model_artifact="path/to/ensemble_manifest.json",
    device="auto"
)
result = predictor.predict_sequence(
    "ACDEFGHIKLMNPQRSTVWY"
)
print(result.binary_mask)

Why PepSeqPred

PepSeqPred is built for reproducible residue-level prediction under strong class imbalance. Training and validation are documented as protocol-first; numeric scorecards are based on seeded evaluation snapshots.

Training

Ensemble-kfold and seeded runs with deterministic split/train seeds.
ID-family-aware splitting to reduce leakage risk across related proteins.
DistributedDataParallel support for multi-GPU HPC workflows.
Run artifacts include checkpoints, manifests, and run-level CSV/JSON outputs.

Validation

Checkpoint selection records threshold, PR-AUC, F1, MCC, AUC, and AUC10.
Threshold policy maximizes recall subject to minimum precision constraints.
Validation metrics are captured per run with explicit seed provenance.
PhaseB validation scorecards are shown directly from manually curated run artifacts.

Evaluation

Seeded external Cocci evaluation compares flagship models across sets 1-10.
Class prevalence is very low, so PR metrics are emphasized over accuracy.
Paired set statistics include bootstrap confidence intervals and sign tests.
Frozen benchmark snapshot remains available as a curated web evidence source.