PUBLIC K6 CALIBRATION

Official evaluation gate

The manual Official-like Benchmark workflow and scripts/ci-official-benchmark.sh use zanfranceschi/rinha-de-backend-2026 at official_ref/OFFICIAL_REF=main by default. Older refs are only for explicitly labeled historical reproduction.

The workflow accepts official_ref, webapi_image, and benchmark_repetitions. The local script mirrors those as OFFICIAL_REF, WEBAPI_IMAGE, and BENCHMARK_REPETITIONS, with RESULTS_DIR defaulting to benchmark-results and BENCHMARK_K6_MODE fixed to native.

Artifacts are retained for audit: selected results.json, results-repetition-*.json, repetition-summary.json for multi-run selection, k6-output and k6-report files, docker-compose.log, and docker-state snapshots before/after repetitions.

This is a rejection and calibration gate, not an official score claim. The public evaluator is used only as a black-box harness; payloads, expected labels, and derived lookup tables must not enter shipped runtime sources.

CHECKS AND SOURCE CUES

workflow_dispatch input official_ref defaults to main
workflow_dispatch input webapi_image may pin a prebuilt API image
workflow_dispatch input benchmark_repetitions controls repeated k6 runs
scripts/ci-official-benchmark.sh emits results.json and repetition artifacts
docs/official-evaluation.md records scoring thresholds and rule caveats