Rust in agent infrastructure

If you build with AI agents, you build infrastructure around them: tool servers, vector stores, sandboxed code execution, prompt logging, evals, observability. Most of this plumbing is latency-sensitive, runs as long-lived servers, and benefits from the same properties that drove query engines to Rust. The ecosystem is younger than the query side, but the canonical crates are emerging.

What you actually build around agents

Layer	What it does
Tool servers (MCP)	Expose tools to agents over a standard protocol (Model Context Protocol).
Vector storage	Embedding indexes for retrieval. Qdrant, Lance, Milvus alternatives.
Sandbox execution	Run agent-written code without trusting it. Wasmtime, Firecracker, gVisor.
Prompt / trace logging	Capture every prompt, response, and tool call. OpenTelemetry, custom storage.
Eval harnesses	Replay traces against new models. Track regressions over time.
Orchestration	Wire agents together. Workflow, DAG, conversation routing.

Each of these has Rust crates worth knowing.

Tool servers and MCP

The Model Context Protocol is becoming the standard wire format for "expose this tool to an agent." There are SDKs in many languages; the Rust ecosystem has a few:

Crate	Notes
`rmcp`	Official Rust SDK. Server + client. JSON-RPC over stdio or HTTP.
`tower`	The generic service abstraction MCP servers compose with.
`axum`	The HTTP layer for HTTP-transport MCP servers.

A Rust MCP server is typically: an async tower::Service that owns the tools' state, an axum HTTP listener, and a serde_json-typed request/response surface. Latency, type safety, and the ability to embed the server in a larger binary all favor Rust here.

Vector stores

Crate / project	Notes
`qdrant-client`	Client for the Qdrant vector DB (which is itself Rust).
`lancedb`	Embedded vector + retrieval DB. Lance file format, Arrow-native.
`hnsw_rs`	Pure-Rust HNSW index for in-process embeddings.

Lance is interesting because it sits in the Arrow ecosystem (so it composes with DataFusion / Sail / Polars). For small embedded use, hnsw_rs or rolling your own with ndarray is fine.

Sandboxed code execution

When the agent writes code you cannot trust to run on your machine directly:

Crate / runtime	Notes
`wasmtime`	Embed a Wasm runtime in your Rust binary. Safest option for "run untrusted code."
`wasmer`	Alternative Wasm runtime.
Firecracker microVMs	Heavy. Better isolation. AWS Lambda uses this internally.
gVisor / containers	Heavier still. OS-level isolation.

For agent-orchestration systems, Wasm is usually the right default. The compile-Rust-to-Wasm story is mature, the runtime is fast, and the isolation is strong without paying for a full VM.

Observability

A few things that matter specifically for agent systems:

Crate	Notes
`tracing`	Async-aware structured logging. The default for Tokio-based code.
`opentelemetry-rust`	Export to OTLP backends (Honeycomb, Jaeger, Tempo).
`tracing-opentelemetry`	Bridge tracing spans to OTLP.

For agent traces specifically, the pattern is: one tracing span per prompt, child spans per tool call, attributes for model, tokens, latency. This composes naturally with tracing and ships to any OTLP-compatible backend.

Sail uses this exact pattern in sail-telemetry. Borrow the structure.

Orchestration

This is the layer that is most up for grabs. "Workflow over LLM calls" is being reinvented constantly. Today the production-ready Rust options are:

Roll your own (it is genuinely not that much code, and the semantics are project-specific).
Use a generic actor / state-machine library (e.g. stateright) and build conversation routing on top.
Embed a workflow engine like Temporal via its Rust SDK if you need durability.

If you are designing this from scratch, the durable-state, typed-message-passing, retry-on-failure parts of the design are exactly what Rust traits and enums are good for. See Sail's sail-server::actor module for a small, real example.

What to look for as an orchestrator

When an agent writes infrastructure-shaped Rust:

Pattern	Watch for
`tokio::spawn` with `Arc<Mutex<State>>`	Re-evaluate whether the sharing is necessary. Often a channel-based actor is cleaner.
Error types that escape the layer	Vector store errors should not leak into the orchestrator's error type.
`unwrap()` on JSON-parsed user input	Agents do this. Wrap with `?` and a real error variant.
Synchronous calls inside async handlers	Use `spawn_blocking` for CPU work, or restructure.
Custom protocol implementations	If MCP fits, use it. Custom protocols are debt.

The agent's job is to keep your infrastructure layer small and typed. Your job is to push back when it sprawls.