Rust in agent infrastructure
If you build with AI agents, you build infrastructure around them: tool servers, vector stores, sandboxed code execution, prompt logging, evals, observability. Most of this plumbing is latency-sensitive, runs as long-lived servers, and benefits from the same properties that drove query engines to Rust. The ecosystem is younger than the query side, but the canonical crates are emerging.
What you actually build around agents
| Layer | What it does |
|---|---|
| Tool servers (MCP) | Expose tools to agents over a standard protocol (Model Context Protocol). |
| Vector storage | Embedding indexes for retrieval. Qdrant, Lance, Milvus alternatives. |
| Sandbox execution | Run agent-written code without trusting it. Wasmtime, Firecracker, gVisor. |
| Prompt / trace logging | Capture every prompt, response, and tool call. OpenTelemetry, custom storage. |
| Eval harnesses | Replay traces against new models. Track regressions over time. |
| Orchestration | Wire agents together. Workflow, DAG, conversation routing. |
Each of these has Rust crates worth knowing.
Tool servers and MCP
The Model Context Protocol is becoming the standard wire format for "expose this tool to an agent." There are SDKs in many languages; the Rust ecosystem has a few:
| Crate | Notes |
|---|---|
rmcp | Official Rust SDK. Server + client. JSON-RPC over stdio or HTTP. |
tower | The generic service abstraction MCP servers compose with. |
axum | The HTTP layer for HTTP-transport MCP servers. |
A Rust MCP server is typically: an async tower::Service that owns the tools' state, an axum HTTP listener, and a serde_json-typed request/response surface. Latency, type safety, and the ability to embed the server in a larger binary all favor Rust here.
Vector stores
| Crate / project | Notes |
|---|---|
qdrant-client | Client for the Qdrant vector DB (which is itself Rust). |
lancedb | Embedded vector + retrieval DB. Lance file format, Arrow-native. |
hnsw_rs | Pure-Rust HNSW index for in-process embeddings. |
Lance is interesting because it sits in the Arrow ecosystem (so it composes with DataFusion / Sail / Polars). For small embedded use, hnsw_rs or rolling your own with ndarray is fine.
Sandboxed code execution
When the agent writes code you cannot trust to run on your machine directly:
| Crate / runtime | Notes |
|---|---|
wasmtime | Embed a Wasm runtime in your Rust binary. Safest option for "run untrusted code." |
wasmer | Alternative Wasm runtime. |
| Firecracker microVMs | Heavy. Better isolation. AWS Lambda uses this internally. |
| gVisor / containers | Heavier still. OS-level isolation. |
For agent-orchestration systems, Wasm is usually the right default. The compile-Rust-to-Wasm story is mature, the runtime is fast, and the isolation is strong without paying for a full VM.
Observability
A few things that matter specifically for agent systems:
| Crate | Notes |
|---|---|
tracing | Async-aware structured logging. The default for Tokio-based code. |
opentelemetry-rust | Export to OTLP backends (Honeycomb, Jaeger, Tempo). |
tracing-opentelemetry | Bridge tracing spans to OTLP. |
For agent traces specifically, the pattern is: one tracing span per prompt, child spans per tool call, attributes for model, tokens, latency. This composes naturally with tracing and ships to any OTLP-compatible backend.
Sail uses this exact pattern in sail-telemetry. Borrow the structure.
Orchestration
This is the layer that is most up for grabs. "Workflow over LLM calls" is being reinvented constantly. Today the production-ready Rust options are:
- Roll your own (it is genuinely not that much code, and the semantics are project-specific).
- Use a generic actor / state-machine library (e.g.
stateright) and build conversation routing on top. - Embed a workflow engine like Temporal via its Rust SDK if you need durability.
If you are designing this from scratch, the durable-state, typed-message-passing, retry-on-failure parts of the design are exactly what Rust traits and enums are good for. See Sail's sail-server::actor module for a small, real example.
What to look for as an orchestrator
When an agent writes infrastructure-shaped Rust:
| Pattern | Watch for |
|---|---|
tokio::spawn with Arc<Mutex<State>> | Re-evaluate whether the sharing is necessary. Often a channel-based actor is cleaner. |
| Error types that escape the layer | Vector store errors should not leak into the orchestrator's error type. |
unwrap() on JSON-parsed user input | Agents do this. Wrap with ? and a real error variant. |
| Synchronous calls inside async handlers | Use spawn_blocking for CPU work, or restructure. |
| Custom protocol implementations | If MCP fits, use it. Custom protocols are debt. |
The agent's job is to keep your infrastructure layer small and typed. Your job is to push back when it sprawls.