rinha4.dotnet

rinha4-back-end-dotnet

.NET 10 NativeAOT implementation for Rinha de Backend 2026.

The current build is optimized for latency first:

  • raw socket HTTP/1 server
  • Unix Domain Sockets behind the standalone rinha4-lb-yolo-mode proxy
  • manual JSON request parsing
  • prebuilt HTTP responses
  • rounded int16 IVF2 fraud classifier
  • archived official-like k6 results after each main build
  • optional one-core CI contention probe for mismatch diagnosis

The project target is explicit: lead the .NET entries, keep score 6000, and keep 0 failures.

Current signal

Latest CI benchmark history lives at /reports/.

The home page reads the latest official Rinha issue result from docs/public/official/latest.json and the latest CI candidate result from docs/public/reports/latest-candidate.json.

CI results are useful for regression tracking. They are not official Rinha hardware results. Candidate CI runs keep the canonical docker-compose.yml standalone-yolo layout; manual stress runs can pin all service containers to one host CPU when diagnosing official-preview mismatch.

Active lane

Transport is currently stable in CI. The active lane is the IVF approximate-nearest-neighbor index built from the allowed reference dataset and loaded at startup.

Repository map

PathPurpose
src/WebApiNativeAOT fraud-score server
src/DataConverterConverts official reference data into references.ivf.bin
dataAllowed challenge datasets and normalization files copied into the API image
testsFocused validation tests
scriptsBenchmark and report archive automation
docs/public/reportsVersioned benchmark JSON history

Challenge

Rinha de Backend 2026 scores implementations by latency, correctness, and request survival under the official k6 workload.

Required endpoints:

MethodPathRole
GET/readyreadiness probe
POST/fraud-scorefraud decision

Default topology:

  • reverse proxy on 9999
  • two API instances
  • total container budget: 1.00 CPU / 350 MB
  • no privileged container
  • runnable through Docker Compose

Ranking pressure:

  • lower p99 improves p99 score
  • 0% failures preserves detection score
  • HTTP errors destroy score quickly

This repository currently keeps transport errors at 0 in CI-like runs. The candidate IVF2 path also replays the public payload with 0 false positives and 0 false negatives locally. Remaining ranking work is p99 reduction without giving back correctness.

Architecture

k6 / judge
    |
    v
rinha4-lb-yolo-mode :9999
    |
    +-- unix:/sockets/api1.sock -> WebApi NativeAOT
    |
    +-- unix:/sockets/api2.sock -> WebApi NativeAOT

Request path

  1. The standalone yolo load balancer accepts TCP on port 9999 in proxy mode.
  2. It forwards bytes to API instances over Unix Domain Sockets. The compose file pins webapi1 to cpuset 0, webapi2 to 1, and lb to 2,3. CPU quotas still total 1.00; cpuset reduces scheduler contention under the official host.
  3. RawHttpServer accepts the socket connection.
  4. HttpWire parses method, path, headers, and Content-Length.
  5. FraudRequestParser reads only required JSON fields.
  6. FraudScorer builds a normalized 14-dimensional vector.
  7. FraudScorer maps vector to the IVF classifier.
  8. HttpResponses writes a prebuilt HTTP/JSON response.

Data pipeline

src/DataConverter converts data/references.json.gz into data/references.ivf.bin during image build.

The references.ivf.bin file stores:

  • IVF2 magic for the candidate default
  • trained int16 centroids in dimension-major layout
  • per-cluster int16 bounding boxes in dimension-major layout
  • packed int16 vector blocks
  • labels and original ids for deterministic top-five tie-breaking

Classifier

Default and only runtime mode uses IVF.

Startup loads the IVF index and runs nearest-cluster search. Current settings target nprobe=1, one-pass full bbox repair, and rounded int16 squared L2 ranking. IVF2 uses int64 accumulation for accuracy. If the IVF file is missing or invalid, startup fails.

Runtime implementation is split into focused partial files:

  • IvfIndex.cs: binary loading, validation, immutable arrays, and search dispatch
  • IvfIndex.Int64.cs: IVF2 candidate path for IVF_SCALE=10000

Startup readiness

Each API process recreates its Unix socket file on startup in the shared sockets tmpfs volume. The standalone LB consumes /sockets/api1.sock and /sockets/api2.sock and keeps the proxy layer byte-oriented.

Rules

Allowed in this repo:

  • preprocess references.json.gz
  • preprocess mcc_risk.json
  • preprocess normalization.json
  • use any classifier built from allowed reference data
  • build ANN or IVF indexes from references.json.gz
  • run the public official k6 script in CI
  • compare against the public ranking preview

Not allowed:

  • using official test payloads as reference data
  • hardcoding expected answers from preview runs
  • building correction tables from misclassified test payloads
  • letting the reverse proxy inspect fraud payloads or answer /fraud-score

The CI benchmark only mounts official test data into the k6 container. API containers do not receive test payload files.

The IVF scorer follows the same boundary: it trains and packs only references.json.gz, reads no benchmark payload files, and fails startup when the index is unavailable.

Getting Started

Build the app:

dotnet build src/WebApi/WebApi.csproj --no-restore

Run local stack:

docker compose up --build
curl -i http://localhost:9999/ready

Tune IVF image-build parameters with IVF_CLUSTERS, IVF_TRAIN_SAMPLE, IVF_ITERATIONS, and IVF_SCALE when testing alternatives. Runtime repair controls are IVF_FAST_NPROBE, IVF_FULL_NPROBE, IVF_BBOX_REPAIR, IVF_REPAIR_MIN_FRAUDS, and IVF_REPAIR_MAX_FRAUDS.

Generate IVF data without Docker:

dotnet run --project src/DataConverter/DataConverter.csproj -- data/

Run focused tests:

dotnet run --project tests/VectorizationTests/VectorizationTests.csproj --no-restore

Run official-like benchmark locally:

bash scripts/ci-official-benchmark.sh

Run docs locally:

cd docs
bun install
bun run dev

Performance

Hot path choices:

  • NativeAOT publish
  • raw socket HTTP/1
  • one task per client connection
  • pooled read buffers
  • Unix Domain Sockets behind the standalone yolo proxy
  • manual request parsing
  • no model binding
  • prebuilt response bytes
  • rounded int16 IVF nearest-neighbor ranking
  • no fraud-payload parsing in the proxy layer

Current bottleneck

Transport is fast enough for the current target. Recent yolo-LB CI runs have shown 0 HTTP errors; p99 work is now inside IVF repair, vector scan cost, and CPU split between the API containers and the standalone proxy.

The latest validated main build before this cleanup used image ci-ecdcc3f1b0059842489ae32102763ac957cc2a36 and produced p99 0.40ms, score 6000, 0 false positives, 0 false negatives, and 0 HTTP errors in the automatic benchmark lane. A same-matrix comparison with that image was also correct but narrowly trailed Danilo in that run (0.39ms vs 0.37ms).

Accuracy experiments

Earlier non-candidate classifier paths were removed from production. Rounded IVF2 is the only runtime classifier now.

The current production lane is IVF approximate nearest-neighbor search:

  • build centroids and compact vector blocks from references.json.gz
  • load references.ivf.bin at startup
  • scan the nearest cluster first with IVF_FAST_NPROBE=1
  • use scalar bbox repair with early exit to scan only clusters whose bounding box can still beat the current top-five bound
  • skip repair for first-cluster 0/5 approval and 5/5 denial candidates below tuned distance bounds
  • rank candidates with rounded int16 squared L2 distance
  • use one-pass full bbox repair for the accuracy candidate

This path is implemented, unit-tested on a synthetic boundary case, and under CI benchmarking as the submission default.

Rejected A/Bs: AVX2 bbox repair raised p99 to 5.37ms; a cluster-major bbox copy raised p99 to 6.89ms; 4096 clusters raised p99 to 16.69ms; 1024 clusters raised p99 to 19.78ms; removed experiments either missed labels or lost to the current standalone-yolo path.

Reverse proxy

The retained load balancer path is the standalone rinha4-lb-yolo-mode image in LB_MODE=proxy. It keeps the proxy byte-oriented on port 9999 and forwards to the API containers over Unix Domain Sockets.

The benchmark workflow runs the canonical root docker-compose.yml used by the submission. The compose file allocates 0.42 CPU / 160 MB to each API container and 0.16 CPU / 30 MB to the proxy while keeping the total at 1.00 CPU / 350 MB.

CI/CD Pipeline

Main build flow:

  1. Build amd64 Docker image.
  2. Push immutable ci-${GITHUB_SHA} tag to GHCR.
  3. Start Docker Compose with that exact image.
  4. Clone official Rinha 2026 repo.
  5. Run public test/test.js through k6.
  6. Upload raw benchmark artifacts.
  7. Archive summarized JSON into docs/public/reports.
  8. GitHub Pages deploys the docs site.

The automatic main-branch benchmark runs against the immutable image tag built in the same workflow, not a locally rebuilt image. The canonical submission/runtime shape is root docker-compose.yml: webapi1 on cpuset 0, webapi2 on 1, and standalone lb on 2,3, while Docker resource limits remain active. Manual runs can add a one-core overlay when diagnosing official mismatch, but that stress mode is stricter than the candidate tracking run.

The build workflow also archives an official-calibrated run after the normal candidate run. That lane can override service CPU quotas to screen splits such as api=0.40 and proxy=0.20. It is a prediction/screening signal only; the candidate/submission compose remains the source for official testing.

Manual Official-like Benchmark runs can archive experiment reports too. For IVF, dispatch with report_kind=experiment, IVF_FAST_NPROBE=1, IVF_FULL_NPROBE=1, bbox repair on, IVF_BOUNDARY_FULL=false, repair fraud range 0..5, and the IVF_SCALE value under test.

Manual contention knobs:

  • benchmark_stack_cpuset=0: pin the standalone LB and WebApi containers to one host CPU.
  • benchmark_k6_cpuset=0: also pin k6 to that CPU. Use only when diagnosing host contention; it is intentionally harsher than normal candidate tracking.
  • benchmark_api_cpus and benchmark_proxy_cpus: override service CPU quotas for calibrated or split-screening runs, for example 0.40 and 0.20.
  • benchmark_repetitions: run k6 multiple times and archive the median-p99 result, with raw repetition files uploaded as artifacts.

Report files

FilePurpose
latest.jsonlatest benchmark result
latest-candidate.jsonlatest default submission-stack result
latest-calibrated.jsonlatest official-calibrated prediction run
latest-experiment.jsonlatest non-default experiment result
index.jsonsorted benchmark history
rinha-benchmark-*.jsonimmutable benchmark records
rinha-benchmark-*.htmlk6 HTML reports when generated

Uploaded workflow artifacts also include docker-state-*.txt with Docker limits, cpuset, memory, and cgroup counters captured before and after k6. Use those files to confirm which cpuset mode the run used.

The report archive commit is docs-only. The build workflow ignores docs/**, so report commits do not trigger a new benchmark loop.

When benchmark reports change, the build workflow triggers the Pages workflow so /reports/ refreshes without manual action. The manual benchmark workflow does the same refresh after archiving a report.