Flagship2 vs Flagship1 Results
Headline metrics below are means across seeded evaluation sets and paired deltas (flagship2 - flagship1). PR metrics are primary because the positive class is rare.
Positive-class prevalence in this snapshot: 0.1069%. Accuracy can look high under imbalance; PR-AUC and PR lift are more informative for epitope recovery quality.
| Metric | Flagship1 mean | Flagship2 mean | Delta (F2 - F1) | 95% bootstrap CI | Sign-test p |
|---|---|---|---|---|---|
| Best-fold PR-AUC (primary) * | 0.004416 | 0.007278 | +0.002862 | +0.001045 to +0.004890 | 2.15e-2 |
| Best-fold PR lift | 4.129 | 6.806 | +2.677 | +0.991 to +4.620 | 2.15e-2 |
| Best-fold ROC AUC | 0.637707 | 0.639068 | +0.001361 | -0.001267 to +0.004166 | 3.44e-1 |
| Overall PR-AUC | 0.003338 | 0.004251 | +0.000914 | +0.000443 to +0.001422 | 1.95e-3 |




Snapshot date: 2026-04-10. Generated: 2026-04-10T16:31:29.939052+00:00.
