Cargo as your accountability layer

Cargo is the most important tool you have as an orchestrator. It is the only thing in the loop that does not negotiate. Agents argue, Cargo does not. If the agent claims a change works, Cargo proves it or refutes it.

This page gives you the gates, in order of cost and value, plus the exact configs to enforce them.

The five gates, in order

GateCommandCatchesCost
Formatcargo fmt --checkStyle driftInstant
Compilecargo check --all-targetsType errors, unused importsSeconds
Lintcargo clippy --all-targets -- -D warningsIdiomatic mistakes, perf footgunsSeconds
Testcargo test --workspace (or cargo nextest run)Behavior regressionsMinutes
Auditcargo audit and cargo deny checkCVEs, license issues, banned depsSeconds

Run all five on every agent diff. The agent's job is to make all five pass before you read the code.

What each one really catches

cargo fmt --check

Style consistency. Catches the rare agent that decides to invent its own indentation. Free to run. Always on.

cargo check

The compiler runs without producing an artifact. Same diagnostics as a full build, ~10x faster. This is your default "did the agent's code at least typecheck" smoke test.

cargo clippy -- -D warnings

The crucial gate. Clippy catches the dumb idioms agents fall into: unnecessary clones, redundant closures, manual match instead of ?, single-character variable names in non-loop contexts, suspicious comparisons, hidden allocations. -D warnings upgrades every warning to a hard error, so agents cannot ship code that "compiles but is sloppy."

cargo test or cargo nextest run

Runs the test suite. cargo nextest is a drop-in replacement that runs tests in parallel processes and gives much better output. Worth installing.

For Rust projects with slow link times, also gate cargo test --no-run separately so you know if the build broke even before tests start.

cargo audit and cargo deny

cargo audit checks dependencies against the RustSec advisory DB. cargo deny is more powerful: it enforces license policy, bans specific crates, prevents duplicate versions, and surfaces unmaintained deps.

Agents will add dependencies. You want to know which ones.

The pre-commit hook

Put this at .git/hooks/pre-commit and chmod +x it, or wire it via pre-commit.com if you use that.

#!/usr/bin/env bash
set -euo pipefail
 
echo "==> fmt check"
cargo fmt --check
 
echo "==> clippy"
cargo clippy --all-targets --all-features -- -D warnings
 
echo "==> test build"
cargo test --no-run --workspace
 
echo "==> test run"
cargo test --workspace --quiet

The hook makes the gates impossible to skip without --no-verify. Combined with npm-style discipline, this is the single highest-leverage thing you can install for agent-written code.

The CI config

GitHub Actions, the version I actually use for personal Rust work:

.github/workflows/ci.yml
name: ci
 
on:
  pull_request:
  push:
    branches: [main]
 
env:
  CARGO_TERM_COLOR: always
  RUSTFLAGS: "-Dwarnings"
 
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt, clippy
      - uses: Swatinem/rust-cache@v2
      - run: cargo fmt --check
      - run: cargo clippy --all-targets --all-features -- -D warnings
      - run: cargo check --all-targets --all-features
 
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - uses: Swatinem/rust-cache@v2
      - uses: taiki-e/install-action@nextest
      - run: cargo nextest run --workspace
 
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: EmbarkStudios/cargo-deny-action@v1

A starter clippy.toml

Sail uses something close to this. It picks up the most common agent-written sloppiness without being annoying.

clippy.toml
# Reject single-letter names except in tiny scopes.
single-char-binding-names-threshold = 4
 
# Reject huge enum variants (each variant pays the size of the largest).
enum-variant-size-threshold = 256
 
# Set a real cognitive complexity ceiling.
cognitive-complexity-threshold = 25
 
# Reject too-many-arguments above this.
too-many-arguments-threshold = 7
 
# Reject too-many-lines for a single fn.
too-many-lines-threshold = 120

And in your crate root (lib.rs or main.rs):

src/lib.rs
#![warn(
    clippy::pedantic,
    clippy::nursery,
    clippy::unwrap_used,
    clippy::expect_used,
    clippy::panic,
    clippy::todo,
    clippy::unimplemented,
    clippy::dbg_macro,
)]
#![allow(clippy::module_name_repetitions)]  // common pedantic false positive

The unwrap_used, expect_used, and panic lints alone catch most of the failure modes from the previous chapter.

A starter deny.toml

deny.toml
[bans]
multiple-versions = "warn"
wildcards = "deny"
deny = [
    { name = "openssl" },        # prefer rustls
    { name = "git2" },           # heavy native dep, usually unneeded
]
 
[licenses]
allow = [
    "MIT", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "ISC",
    "Unicode-DFS-2016", "Zlib", "CC0-1.0", "MPL-2.0",
]
copyleft = "deny"
default = "deny"
 
[advisories]
db-path = "~/.cargo/advisory-db"
vulnerability = "deny"
unmaintained = "warn"
yanked = "deny"

The prompt that bakes this in

Add this to your project's AGENTS.md or system prompt so the agent works inside the gates from the start:

What the gates do not catch

Worth being honest about. The gates catch:

  • Type errors, syntax errors, borrow violations.
  • Idiom violations Clippy knows about.
  • Behavior regressions covered by tests.
  • Format drift, banned deps, CVEs.

The gates do not catch:

  • Wrong design. A working unsafe-by-design API still passes the gates.
  • Missing tests. If the agent did not add a test for the new failure mode, no gate fires.
  • Performance regressions. cargo bench is a separate gate, not a default one.
  • Architectural drift. Layering, module boundaries, abstraction quality. Human eyes only.

That residual is what the 5-minute PR review is for. The gates are the floor. Your review is the ceiling.