rinha4.dotnet

SOURCE-BACKED WIKI

TRACE THE SUBMISSION WITHOUT LOSING THE EVIDENCE LANES.

This docs surface keeps architecture, rules, performance notes, and CI provenance in one terminal-readable lane. Official upstream data, candidate CI evidence, and calibrated CI projections remain intentionally separated.

rinha4-back-end-dotnet

.NET 10 NativeAOT implementation for Rinha de Backend 2026.

The current build is optimized for latency first:

  • raw socket HTTP/1 server
  • Unix Domain Sockets behind the standalone rinha4-lb-yolo-mode proxy
  • manual JSON request parsing
  • prebuilt HTTP responses
  • default clean IVF scorer with bounded bbox repair
  • archived official-like k6 results after each main build
  • optional manual CI experiments for scorer modes, bucket/hybrid paths, CPU splits, and fd-pass diagnostics

The project target is explicit: lead the .NET entries, keep score 6000, and keep 0 failures.

Current signal

Latest CI benchmark history lives at /reports/.

The home page reads the latest official Rinha issue result from docs/public/official/latest.json and the latest CI candidate result from docs/public/reports/latest-candidate.json.

CI results are useful for regression tracking. They are not official Rinha hardware results. Candidate CI runs keep the canonical docker-compose.yml standalone-yolo layout; manual stress runs can override cpusets or CPU quotas when diagnosing official-preview mismatch.

Active lane

Transport is currently stable in CI. The active lane is the clean IVF scorer built from the allowed reference dataset and loaded at startup with SCORER_MODE=ivf. It scans the nearest IVF clusters first and uses bounded bounding-box repair to protect the clean 6000 correctness gate. Bucket, hybrid, and exact modes remain available for explicit experiments and diagnostics; they are not the default candidate path.

Repository map

Path Purpose
src/WebApi NativeAOT fraud-score server
src/DataConverter Converts official reference data into bucket, IVF, and exact diagnostic binaries
data Allowed challenge datasets and normalization files copied into the API image
tests Focused validation tests
scripts Benchmark and report archive automation
docs/public/reports Versioned benchmark JSON history

Challenge

Rinha de Backend 2026 scores implementations by latency, correctness, and request survival under the official k6 workload.

Required endpoints:

Method Path Role
GET /ready readiness probe
POST /fraud-score fraud decision

Default topology:

  • reverse proxy on 9999
  • two API instances
  • total container budget: 1.00 CPU / 350 MB
  • no privileged container
  • runnable through Docker Compose

Ranking pressure:

  • lower p99 improves p99 score
  • 0% failures preserves detection score
  • HTTP errors destroy score quickly

This repository currently keeps transport errors at 0 in CI-like runs. The current candidate IVF path replays the public payload with 0 false positives and 0 false negatives in archived CI benchmarks. Remaining ranking work is p99 reduction without giving back correctness.

Architecture

k6 / judge
    |
    v
rinha4-lb-yolo-mode :9999
    |
    +-- fdpass:/sockets/api1.sock.ctrl -> WebApi NativeAOT
    |
    +-- fdpass:/sockets/api2.sock.ctrl -> WebApi NativeAOT

Request path

  1. The standalone yolo load balancer accepts TCP on port 9999 in fdpass mode.
  2. It selects an API instance and passes the accepted client fd over a Unix control socket. The compose file pins webapi1 to cpuset 0,1, webapi2 to 2,3, and lb to 0,2. CPU quotas still total 1.00; the overlapping cpuset keeps the proxy close to both API lanes.
  3. RawHttpServer accepts the fd-pass control message and serves the client fd directly when FD_RAW=1.
  4. HttpWire parses method, path, headers, and Content-Length.
  5. FraudRequestParser reads only required JSON fields.
  6. FraudScorer builds a normalized 14-dimensional vector.
  7. FraudScorer runs the default IVF path, then bounded repair when configured. Bucket/hybrid fast paths are available only when SCORER_MODE=bucket or SCORER_MODE=hybrid is selected for experiments.
  8. HttpResponses writes a prebuilt HTTP/JSON response.

Data pipeline

src/DataConverter converts data/references.json.gz into runtime binary data during image build.

The current clean runtime uses references.ivf.bin for IVF search and repair. The image also carries:

  • references.bucket.bin for bucket or hybrid experiment modes
  • references.bin for the explicit exact diagnostic mode used by tests and manual benchmark experiments

Only the IVF file is required by the default submission scorer; missing or invalid default scorer data fails startup instead of silently degrading correctness.

The references.ivf.bin file stores:

  • IVF2 magic for the current int64-distance IVF layout
  • trained int16 centroids in dimension-major layout
  • per-cluster int16 bounding boxes in dimension-major layout
  • packed int16 vector blocks
  • labels and original ids for deterministic top-five tie-breaking

Classifier

Default submission runtime uses SCORER_MODE=ivf against the current official main dataset. It scans the two nearest IVF centroid clusters, then uses bounding-box repair to preserve exact top-five decisions while keeping the clean 6000 correctness gate.

Current defaults target IVF nprobe=2, bounding-box repair, rounded int16 squared L2 ranking, and tuned distance thresholds for safe 0/5 approvals and 5/5 denials. If the required IVF file is missing or invalid, startup fails instead of silently serving with a weaker classifier. The bucket/hybrid path remains available for explicit latency experiments, not as the clean default.

Runtime implementation is split into focused files:

  • BucketIndex.cs: bucket lookup, profile/reference fast paths, risky fallback, and exact scan helpers
  • BucketSearchOptions.cs: BUCKET_* runtime controls
  • IvfIndex.cs: binary loading, validation, immutable arrays, and search dispatch
  • IvfIndex.Int64.cs: IVF2 candidate path for IVF_SCALE=10000
  • FraudScorer.cs: normalization, mode selection, and bucket/IVF orchestration

Startup readiness

Each API process recreates its fd-pass control socket on startup in the shared sockets tmpfs volume. The standalone LB consumes /sockets/api1.sock.ctrl and /sockets/api2.sock.ctrl, then stays out of the fraud payload path after handing off the accepted client fd.

Rules

Allowed in this repo:

  • preprocess references.json.gz
  • preprocess mcc_risk.json
  • preprocess normalization.json
  • use any classifier built from allowed reference data
  • build ANN or IVF indexes from references.json.gz
  • run the public official k6 script in CI
  • compare against the public ranking preview

Not allowed:

  • using official test payloads as reference data
  • hardcoding expected answers from preview runs
  • building correction tables from misclassified test payloads
  • letting the reverse proxy inspect fraud payloads or answer /fraud-score

The CI benchmark only mounts official test data into the k6 container. API containers do not receive test payload files.

The IVF scorer follows the same boundary: it trains and packs only references.json.gz, reads no benchmark payload files, and fails startup when the index is unavailable.

Getting Started

Restore and build the app:

dotnet restore src/WebApi/WebApi.csproj
dotnet build src/WebApi/WebApi.csproj -c Release --no-restore

Run local stack:

docker compose up --build
curl -i http://localhost:9999/ready

The default compose stack runs SCORER_MODE=ivf with IVF_FAST_NPROBE=2 and bounding-box repair for current-main clean 6000 scoring. It allocates 0.425 CPU / 165 MB to each WebApi container and 0.15 CPU / 20 MB to the standalone proxy, with fd-pass control sockets and FD_RAW=1 enabled by default.

Tune IVF image-build parameters with IVF_CLUSTERS, IVF_TRAIN_SAMPLE, IVF_ITERATIONS, and IVF_SCALE when testing alternatives. Runtime IVF repair controls are IVF_FAST_NPROBE, IVF_FULL_NPROBE, IVF_BOUNDARY_FULL, IVF_BBOX_REPAIR, IVF_REPAIR_MIN_FRAUDS, IVF_REPAIR_MAX_FRAUDS, IVF_ZERO_FAST_APPROVE_WORST_DISTANCE, and IVF_FIVE_FAST_DENY_WORST_DISTANCE.

Tune bucket runtime behavior with BUCKET_EARLY_CANDIDATES, BUCKET_MIN_CANDIDATES, BUCKET_MAX_CANDIDATES, BUCKET_PROFILE_FASTPATH, BUCKET_REFERENCE_FASTPATH*, BUCKET_PROFILE_*_MIN_COUNT, BUCKET_EXACT_FALLBACK, and BUCKET_AVX_CUTOFF_DIMS.

Generate runtime data without Docker:

dotnet run --project src/DataConverter/DataConverter.csproj -c Release -- data/

Run one API directly over local TCP :8080 without the load balancer:

DATA_DIR=data \
IVF_PATH=data/references.ivf.bin \
BUCKET_PATH=data/references.bucket.bin \
SCORER_MODE=ivf \
  dotnet run --project src/WebApi/WebApi.csproj -c Release --no-restore

curl -i http://localhost:8080/ready

The public contest shape remains the compose stack on :9999; direct TCP is for parser/scorer smoke tests and local debugging.

Run focused tests:

dotnet restore tests/VectorizationTests/VectorizationTests.csproj
dotnet run --project tests/VectorizationTests/VectorizationTests.csproj -c Release --no-restore

Run official-like benchmark locally:

bash scripts/ci-official-benchmark.sh

Full compose and benchmark validation require Docker daemon access. If local Docker is unavailable, use the GitHub Actions benchmark workflow.

Run docs locally. CI builds the Astro 6 site with Node 24 and Bun, so use Node 24 locally when reproducing Pages failures:

cd docs
bun install
bun run dev

For production parity, run bun run build before pushing documentation changes.

Performance

Hot path choices:

  • NativeAOT publish
  • raw socket HTTP/1
  • one task per client connection
  • pooled read buffers
  • fd-pass handoff from the standalone yolo load balancer to raw API fds
  • manual request parsing
  • no model binding
  • prebuilt response bytes
  • clean IVF first pass with bounded bbox repair; bucket/hybrid paths are experiment modes
  • no fraud-payload parsing in the proxy layer

Current bottleneck

Transport is fast enough for the current target. Recent yolo-LB CI runs have shown 0 HTTP errors; p99 work is now mostly inside IVF repair/vector scan cost, first-pass fast-decision coverage, and CPU split between the API containers and the standalone proxy. Bucket fast-path coverage remains an experiment-mode topic, not the default candidate path.

The reports page is the source of truth for the newest archived candidate and calibrated runs because every main build can append fresh benchmark artifacts. One recent clean pre-audit candidate image, ci-ead329a626dec7a605145fd278655dfd0fa63a51, produced p99 0.34ms, score 6000, 0 false positives, 0 false negatives, and 0 HTTP errors in the automatic benchmark lane; its paired official-calibrated prediction run reported p99 0.35ms, score 6000, and 0 HTTP errors. These are archived CI results, not official Rinha hardware results.

Accuracy experiments

Earlier non-candidate classifier paths were removed from the default production lane. The current default is clean IVF on the official main dataset:

  • build bucket, IVF, and exact diagnostic data from references.json.gz
  • load references.ivf.bin at startup for default IVF mode
  • keep bucket/hybrid available for explicit latency experiments
  • scan the two nearest IVF clusters first with IVF_FAST_NPROBE=2
  • use scalar bbox repair with early exit to scan only clusters whose bounding box can still beat the current top-five bound
  • skip repair for first-cluster 0/5 approval and 5/5 denial candidates below tuned distance bounds
  • rank fallback candidates with rounded int16 squared L2 distance

This path is implemented, unit-tested on focused parser/vector/index cases, and under CI benchmarking as the clean default. Exact, bucket, and hybrid modes remain available for diagnostics and manual benchmark experiments; they are not the canonical current-main clean lane.

Rejected A/Bs: AVX2 bbox repair raised p99 to 5.37ms; a cluster-major bbox copy raised p99 to 6.89ms; 4096 clusters raised p99 to 16.69ms; 1024 clusters raised p99 to 19.78ms; removed experiments either missed labels or lost to the current standalone-yolo path.

Reverse proxy

The retained load balancer path is the standalone rinha4-lb-yolo-mode image in LB_MODE=fdpass. The LB accepts the external TCP client on port 9999, passes the accepted socket fd to an API container over a Unix control socket, and lets the API serve the client directly. The API default also sets FD_RAW=1, which keeps the passed fd on a low-level recv/send path instead of wrapping every handoff in a managed Socket; set FD_RAW=0 to fall back to the safer managed Socket path for diagnostics.

The benchmark workflow runs the canonical root docker-compose.yml used by the submission. The compose file allocates 0.425 CPU / 165 MB to each API container and 0.15 CPU / 20 MB to the LB while keeping the total at 1.00 CPU / 350 MB.

Tuning Knobs

This page is source-backed by docker-compose.yml, src/WebApi, src/DataConverter, src/WebApi/Dockerfile, .github/workflows/build.yml, .github/workflows/benchmark.yml, and scripts/ci-official-benchmark.sh.

Default candidate posture: keep SCORER_MODE=ivf, FD_RAW=1, IVF_FAST_NPROBE=2, and bbox repair enabled unless a run is explicitly labeled as an experiment.

Default submission stack

Surface Default Source of truth
public port 9999 on lb docker-compose.yml
API instances webapi1, webapi2 docker-compose.yml
API limits 0.425 CPU / 165 MB each docker-compose.yml
LB limits 0.15 CPU / 20 MB docker-compose.yml
API cpusets 0,1 and 2,3 docker-compose.yml
LB cpuset 0,2 docker-compose.yml
scorer mode SCORER_MODE=ivf docker-compose.yml, FraudScorer.cs
handoff mode LB_MODE=fdpass, FD_RAW=1 docker-compose.yml, RawHttpServer.cs

The load balancer passes accepted client file descriptors over Unix control sockets. It does not inspect fraud payloads or proxy fraud-response bytes after handoff.

Runtime scorer controls

Variable Default Applies to Purpose
SCORER_MODE ivf runtime Chooses ivf, bucket, hybrid, or exact. IVF is the clean candidate default.
DATA_DIR /data runtime Directory for generated binary data and JSON resources.
IVF_PATH /data/references.ivf.bin runtime IVF index loaded by default mode. Startup fails if the selected scorer data is invalid.
BUCKET_PATH /data/references.bucket.bin runtime Bucket index for bucket/hybrid experiments.
EXACT_PATH /data/references.bin runtime Exact diagnostic index for tests/manual experiments.
SUBMITTED_FAST_PATH 1 hybrid runtime Enables the submitted hybrid fast path when SCORER_MODE=hybrid; set 0 for diagnostics.

IVF search controls

Variable Default Purpose
IVF_FAST_NPROBE 2 Number of nearest centroid clusters scanned in the first pass.
IVF_FULL_NPROBE 1 Secondary/boundary pass probe count when enabled.
IVF_BOUNDARY_FULL false Enables a broader boundary second pass.
IVF_BBOX_REPAIR true Enables bounding-box repair to preserve exact top-five decisions.
IVF_REPAIR_MIN_FRAUDS 0 Inclusive fraud-count lower bound for repair.
IVF_REPAIR_MAX_FRAUDS 5 Inclusive fraud-count upper bound for repair.
IVF_ZERO_FAST_APPROVE_WORST_DISTANCE 5000000 in compose Distance threshold that lets clean 0/5 approvals skip repair.
IVF_FIVE_FAST_DENY_WORST_DISTANCE 2000000 in compose Distance threshold that lets clean 5/5 denials skip repair.
IVF_PEDRO_DECISION_PATH 0/false Enables Pedro-style borderline expansion for first-pass counts.
IVF_BORDERLINE_MIN_FRAUDS 2 Lower fraud-count bound for borderline expansion.
IVF_BORDERLINE_MAX_FRAUDS 3 Upper fraud-count bound for borderline expansion.
IVF_BORDERLINE_NPROBE 32 Probe count for borderline expansion.
IVF_BORDERLINE_RERANK 128 Rerank candidate count for borderline expansion.
IVF_BBOX_ORDERED false Experimental ordered bbox repair path.
IVF_BBOX_ORDERED_MAX_PROBES 0 Optional cap for ordered bbox probes; 0 means uncapped by this setting.
IVF_FLOAT_AVX2 unset/false Experimental float AVX2 path guard in IvfIndex.cs.

Bucket and hybrid controls

Variable Default Purpose
BUCKET_EARLY_CANDIDATES 9800 Early candidate target for bucket scoring.
BUCKET_MIN_CANDIDATES 16150 Minimum bucket candidate count.
BUCKET_MAX_CANDIDATES 24200 Maximum bucket candidate count.
BUCKET_PROFILE_FASTPATH true Enables profile-level bucket fast paths.
BUCKET_PROFILE_MIN_COUNT 15 Legacy/default profile count used as fallback by bucket options.
BUCKET_PROFILE_LEGIT_MIN_COUNT 5 Legit profile fast-path minimum count.
BUCKET_PROFILE_FRAUD_MIN_COUNT 15 Fraud profile fast-path minimum count.
BUCKET_REFERENCE_FASTPATH true Enables reference fast paths.
BUCKET_REFERENCE_FASTPATH_LEGIT false Enables first reference legit fast path.
BUCKET_REFERENCE_FASTPATH_FRAUD true Enables first reference fraud fast path.
BUCKET_REFERENCE_FASTPATH2_LEGIT true Enables second reference legit fast path.
BUCKET_REFERENCE_FASTPATH2_FRAUD true Enables second reference fraud fast path.
BUCKET_EXACT_FALLBACK risky Controls exact fallback behavior: off/false, uncertain/exact, or risky.
BUCKET_AVX_CUTOFF_DIMS 6 Dimension cutoff for AVX-assisted bucket comparisons.
BUCKET_RISKY_FALLBACK and BUCKET_RISKY_* source defaults Risky fallback thresholds for amount/installments/ratio/distance/merchant averages.

Bucket and hybrid modes are useful for latency experiments. They are not the default clean candidate lane while SCORER_MODE=ivf remains the compose default.

Transport and process controls

Variable Default Purpose
BIND_ADDR fd:/sockets/api*.sock.ctrl in compose; unset locally fd: means fd-pass control socket; unset means local TCP :8080.
FD_RAW 1 Keeps passed fds on low-level recv/send; 0 wraps in managed Socket.
FD_TCP_NODELAY 0 Applies TCP_NODELAY to passed TCP fds when enabled.
FD_TCP_QUICKACK 0 Applies TCP_QUICKACK where available.
FD_TCP_TUNE unset/false Enables both TCP tuning flags in RawHttpServer.
FD_SET_BLOCKING unset/false Forces passed fds to blocking mode for diagnostics.
FD_BUSY_POLL_US 0 Optional Linux socket busy-poll value in microseconds.
ACCEPT_LOOPS 1 Number of accept/control loops.
KEEP_ALIVE_MAX 0 Optional keep-alive request cap; 0 means no cap in the server setting.
THREADPOOL_PREFER_LOCAL 0 Enables thread-pool local preference path.
MIN_WORKER_THREADS 128 Minimum .NET worker threads set at startup.
MAX_WORKER_THREADS unset Optional maximum worker threads override.
MAX_IO_THREADS unset Optional maximum IO completion threads override.
MLOCKALL unset/false Attempts Linux mlockall when enabled.

The compose file also sets .NET runtime switches such as DOTNET_PROCESSOR_COUNT=1, invariant globalization, socket inline completions, and socket thread count. Treat those as part of the submission baseline unless a run is explicitly a runtime experiment.

Image-build and data-conversion controls

Variable / build arg Default Purpose
DOTNET_SDK_IMAGE mcr.microsoft.com/dotnet/sdk:10.0-alpine SDK image used by the Dockerfile; workflows can select .NET 10 or 11 preview images.
DOTNET_ASPNET_IMAGE mcr.microsoft.com/dotnet/aspnet:10.0-alpine Runtime base image.
RINHA_TFM net10.0 Target framework passed to WebApi.csproj; workflow dispatch can use net11.0 preview.
AOT true in compose/workflow Enables NativeAOT publish.
EXTRA_OPTIMIZE true in compose/workflow Enables trimming/runtime feature removals and optimization switches.
BUILD_CONFIGURATION Release Build configuration.
EXACT_MAX_REFS 100000 Caps exact diagnostic references; non-positive means all rows in converter logic.
IVF_CLUSTERS 512 in compose/workflow IVF k-means cluster count for generated index.
IVF_TRAIN_SAMPLE 65536 Training sample size.
IVF_ITERATIONS 6 K-means iterations.
IVF_SCALE 10000 Quantization scale for rounded int16 IVF vectors.
BUCKET_SCALE falls back to IVF_SCALE Bucket quantization scale.
BUCKET_ONLY false Converter shortcut to write only bucket data.

Benchmark and CI controls

Variable / input Default Purpose
official_ref / OFFICIAL_REF main Official Rinha repo ref for the public evaluation suite.
webapi_image / WEBAPI_IMAGE empty or compose default Prebuilt API image. Empty means build from checkout in the manual workflow.
k6_image / K6_IMAGE grafana/k6:latest k6 Docker image when Docker k6 mode is used.
BENCHMARK_K6_MODE native in workflows, docker script default Selects native k6 or Docker k6 execution.
BENCHMARK_REPETITIONS 1 Number of k6 repetitions; report archive uses median p99 when greater than one.
report_kind / BENCHMARK_REPORT_KIND experiment manual default Archive lane: candidate, official-calibrated, or experiment.
BENCHMARK_API_CPUSET / benchmark_api_cpuset unset Overrides API cpusets for calibration/diagnostics.
BENCHMARK_PROXY_CPUSET / benchmark_proxy_cpuset unset Overrides proxy cpuset.
BENCHMARK_STACK_CPUSET unset Applies one cpuset to the whole stack when specific overrides are absent.
BENCHMARK_API_CPUS / benchmark_api_cpus unset Overrides per-API CPU quota.
BENCHMARK_PROXY_CPUS / benchmark_proxy_cpus unset Overrides proxy CPU quota.
BENCHMARK_API_MEMORY unset Script-level API memory override.
BENCHMARK_PROXY_MEMORY unset Script-level proxy memory override.
BENCHMARK_K6_CPUSET unset Optional k6 client cpuset.
BENCHMARK_PULL_IMAGE derived from webapi_image Pull a supplied prebuilt image before benchmark.
BENCHMARK_NO_BUILD derived from webapi_image Skip local image build when using a supplied image.

The benchmark script validates the resolved compose before startup: required services, port 9999, no privileged mode, no host networking, declared CPU/memory limits, and aggregate limits at or below 1 CPU / 350 MB.

CI/CD Pipeline

Main build flow:

  1. Build amd64 Docker image.
  2. Push immutable ci-${GITHUB_SHA} tag to GHCR.
  3. Start Docker Compose with that exact image.
  4. Clone official Rinha 2026 repo.
  5. Run public test/test.js through k6.
  6. Upload raw benchmark artifacts.
  7. Archive summarized JSON into docs/public/reports.
  8. GitHub Pages deploys the docs site.

The automatic main-branch benchmark runs against the immutable image tag built in the same workflow, not a locally rebuilt image. The canonical submission/runtime shape is root docker-compose.yml: webapi1 on cpuset 0,1, webapi2 on 2,3, and standalone lb on 0,2, while Docker resource limits remain active. The current default stack uses SCORER_MODE=ivf with IVF_FAST_NPROBE=2, fd-pass API handoff with FD_RAW=1, MIN_WORKER_THREADS=128, API limits of 0.425 CPU / 165 MB each, and LB limits of 0.15 CPU / 20 MB.

The build workflow also archives an official-calibrated run after the normal candidate run. That lane can override service CPU quotas to screen alternative splits. It is a prediction/screening signal only; the candidate/submission compose remains the source for official testing.

Manual Official-like Benchmark runs can archive experiment reports too. Use report_kind=experiment for non-default scorer/config tests. The manual workflow currently exposes scorer choices ivf, bucket, hybrid, and exact; IVF is the default current-main clean candidate path. It also exposes IVF build/repair knobs, the Pedro-style borderline IVF expansion toggle, bucket AVX cutoff, optional CPU/cpuset overrides, fd_raw, accept-loop count, and repetition count for median-p99 screening. The manual workflow no longer exposes a separate compose override for fd-pass: the root docker-compose.yml is the fd-pass topology. Use the fd_raw input to compare the managed Socket fallback (0) against the raw-fd default (1) on the same topology.

Manual contention knobs:

  • benchmark_api_cpuset and benchmark_proxy_cpuset: optionally override Docker cpusets for API or proxy containers.
  • benchmark_api_cpus and benchmark_proxy_cpus: override service CPU quotas for calibrated or split-screening runs, for example 0.40 and 0.20.
  • benchmark_repetitions: run k6 multiple times and archive the median-p99 result, with raw repetition files uploaded as artifacts.

Report files

File Purpose
latest.json latest benchmark result
latest-candidate.json latest default submission-stack result
latest-calibrated.json latest official-calibrated prediction run
latest-experiment.json latest non-default experiment result
index.json sorted benchmark history
rinha-benchmark-*.json immutable benchmark records
rinha-benchmark-*.html k6 HTML reports when generated

Uploaded workflow artifacts also include docker-state-*.txt with Docker limits, cpuset, memory, and cgroup counters captured before and after k6. Use those files to confirm which cpuset mode the run used.

The report archive commit is docs-only. The build workflow ignores docs/**, so report commits do not trigger a new benchmark loop.

When benchmark reports change, the build workflow triggers the Pages workflow so /reports/ refreshes without manual action. The manual benchmark workflow does the same refresh after archiving a report.