Cargo as your accountability layer
Cargo is the most important tool you have as an orchestrator. It is the only thing in the loop that does not negotiate. Agents argue, Cargo does not. If the agent claims a change works, Cargo proves it or refutes it.
This page gives you the gates, in order of cost and value, plus the exact configs to enforce them.
The five gates, in order
| Gate | Command | Catches | Cost |
|---|---|---|---|
| Format | cargo fmt --check | Style drift | Instant |
| Compile | cargo check --all-targets | Type errors, unused imports | Seconds |
| Lint | cargo clippy --all-targets -- -D warnings | Idiomatic mistakes, perf footguns | Seconds |
| Test | cargo test --workspace (or cargo nextest run) | Behavior regressions | Minutes |
| Audit | cargo audit and cargo deny check | CVEs, license issues, banned deps | Seconds |
Run all five on every agent diff. The agent's job is to make all five pass before you read the code.
What each one really catches
cargo fmt --check
Style consistency. Catches the rare agent that decides to invent its own indentation. Free to run. Always on.
cargo check
The compiler runs without producing an artifact. Same diagnostics as a full build, ~10x faster. This is your default "did the agent's code at least typecheck" smoke test.
cargo clippy -- -D warnings
The crucial gate. Clippy catches the dumb idioms agents fall into: unnecessary clones, redundant closures, manual match instead of ?, single-character variable names in non-loop contexts, suspicious comparisons, hidden allocations. -D warnings upgrades every warning to a hard error, so agents cannot ship code that "compiles but is sloppy."
cargo test or cargo nextest run
Runs the test suite. cargo nextest is a drop-in replacement that runs tests in parallel processes and gives much better output. Worth installing.
For Rust projects with slow link times, also gate cargo test --no-run separately so you know if the build broke even before tests start.
cargo audit and cargo deny
cargo audit checks dependencies against the RustSec advisory DB. cargo deny is more powerful: it enforces license policy, bans specific crates, prevents duplicate versions, and surfaces unmaintained deps.
Agents will add dependencies. You want to know which ones.
The pre-commit hook
Put this at .git/hooks/pre-commit and chmod +x it, or wire it via pre-commit.com if you use that.
#!/usr/bin/env bash
set -euo pipefail
echo "==> fmt check"
cargo fmt --check
echo "==> clippy"
cargo clippy --all-targets --all-features -- -D warnings
echo "==> test build"
cargo test --no-run --workspace
echo "==> test run"
cargo test --workspace --quietThe hook makes the gates impossible to skip without --no-verify. Combined with npm-style discipline, this is the single highest-leverage thing you can install for agent-written code.
The CI config
GitHub Actions, the version I actually use for personal Rust work:
name: ci
on:
pull_request:
push:
branches: [main]
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: "-Dwarnings"
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
components: rustfmt, clippy
- uses: Swatinem/rust-cache@v2
- run: cargo fmt --check
- run: cargo clippy --all-targets --all-features -- -D warnings
- run: cargo check --all-targets --all-features
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- uses: taiki-e/install-action@nextest
- run: cargo nextest run --workspace
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: EmbarkStudios/cargo-deny-action@v1A starter clippy.toml
Sail uses something close to this. It picks up the most common agent-written sloppiness without being annoying.
# Reject single-letter names except in tiny scopes.
single-char-binding-names-threshold = 4
# Reject huge enum variants (each variant pays the size of the largest).
enum-variant-size-threshold = 256
# Set a real cognitive complexity ceiling.
cognitive-complexity-threshold = 25
# Reject too-many-arguments above this.
too-many-arguments-threshold = 7
# Reject too-many-lines for a single fn.
too-many-lines-threshold = 120And in your crate root (lib.rs or main.rs):
#![warn(
clippy::pedantic,
clippy::nursery,
clippy::unwrap_used,
clippy::expect_used,
clippy::panic,
clippy::todo,
clippy::unimplemented,
clippy::dbg_macro,
)]
#![allow(clippy::module_name_repetitions)] // common pedantic false positiveThe unwrap_used, expect_used, and panic lints alone catch most of the failure modes from the previous chapter.
A starter deny.toml
[bans]
multiple-versions = "warn"
wildcards = "deny"
deny = [
{ name = "openssl" }, # prefer rustls
{ name = "git2" }, # heavy native dep, usually unneeded
]
[licenses]
allow = [
"MIT", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "ISC",
"Unicode-DFS-2016", "Zlib", "CC0-1.0", "MPL-2.0",
]
copyleft = "deny"
default = "deny"
[advisories]
db-path = "~/.cargo/advisory-db"
vulnerability = "deny"
unmaintained = "warn"
yanked = "deny"The prompt that bakes this in
Add this to your project's AGENTS.md or system prompt so the agent works inside the gates from the start:
What the gates do not catch
Worth being honest about. The gates catch:
- Type errors, syntax errors, borrow violations.
- Idiom violations Clippy knows about.
- Behavior regressions covered by tests.
- Format drift, banned deps, CVEs.
The gates do not catch:
- Wrong design. A working unsafe-by-design API still passes the gates.
- Missing tests. If the agent did not add a test for the new failure mode, no gate fires.
- Performance regressions.
cargo benchis a separate gate, not a default one. - Architectural drift. Layering, module boundaries, abstraction quality. Human eyes only.
That residual is what the 5-minute PR review is for. The gates are the floor. Your review is the ceiling.