Flagship2 vs Flagship1 Results

Headline metrics below are means across seeded evaluation sets and paired deltas (flagship2 - flagship1). PR metrics are primary because the positive class is rare.

Positive-class prevalence in this snapshot: 0.1069%. Accuracy can look high under imbalance; PR-AUC and PR lift are more informative for epitope recovery quality.
MetricFlagship1 meanFlagship2 meanDelta (F2 - F1)95% bootstrap CISign-test p
Best-fold PR-AUC (primary) *0.0044160.007278+0.002862+0.001045 to +0.0048902.15e-2
Best-fold PR lift4.1296.806+2.677+0.991 to +4.6202.15e-2
Best-fold ROC AUC0.6377070.639068+0.001361-0.001267 to +0.0041663.44e-1
Overall PR-AUC0.0033380.004251+0.000914+0.000443 to +0.0014221.95e-3

Snapshot date: 2026-04-10. Generated: 2026-04-10T16:31:29.939052+00:00.