PUBLIC K6 CALIBRATION
Official evaluation gate
The manual Official-like Benchmark workflow and scripts/ci-official-benchmark.sh use zanfranceschi/rinha-de-backend-2026 at official_ref/OFFICIAL_REF=main by default. Older refs are only for explicitly labeled historical reproduction.
The workflow accepts official_ref, webapi_image, and benchmark_repetitions. The local script mirrors those as OFFICIAL_REF, WEBAPI_IMAGE, and BENCHMARK_REPETITIONS, with RESULTS_DIR defaulting to benchmark-results and BENCHMARK_K6_MODE fixed to native.
Artifacts are retained for audit: selected results.json, results-repetition-*.json, repetition-summary.json for multi-run selection, k6-output and k6-report files, docker-compose.log, and docker-state snapshots before/after repetitions.
This is a rejection and calibration gate, not an official score claim. The public evaluator is used only as a black-box harness; payloads, expected labels, and derived lookup tables must not enter shipped runtime sources.
CHECKS AND SOURCE CUES
- workflow_dispatch input official_ref defaults to main
- workflow_dispatch input webapi_image may pin a prebuilt API image
- workflow_dispatch input benchmark_repetitions controls repeated k6 runs
- scripts/ci-official-benchmark.sh emits results.json and repetition artifacts
- docs/official-evaluation.md records scoring thresholds and rule caveats