Rust in agent infrastructure

If you build with AI agents, you build infrastructure around them: tool servers, vector stores, sandboxed code execution, prompt logging, evals, observability. Most of this plumbing is latency-sensitive, runs as long-lived servers, and benefits from the same properties that drove query engines to Rust. The ecosystem is younger than the query side, but the canonical crates are emerging.

What you actually build around agents

LayerWhat it does
Tool servers (MCP)Expose tools to agents over a standard protocol (Model Context Protocol).
Vector storageEmbedding indexes for retrieval. Qdrant, Lance, Milvus alternatives.
Sandbox executionRun agent-written code without trusting it. Wasmtime, Firecracker, gVisor.
Prompt / trace loggingCapture every prompt, response, and tool call. OpenTelemetry, custom storage.
Eval harnessesReplay traces against new models. Track regressions over time.
OrchestrationWire agents together. Workflow, DAG, conversation routing.

Each of these has Rust crates worth knowing.

Tool servers and MCP

The Model Context Protocol is becoming the standard wire format for "expose this tool to an agent." There are SDKs in many languages; the Rust ecosystem has a few:

CrateNotes
rmcpOfficial Rust SDK. Server + client. JSON-RPC over stdio or HTTP.
towerThe generic service abstraction MCP servers compose with.
axumThe HTTP layer for HTTP-transport MCP servers.

A Rust MCP server is typically: an async tower::Service that owns the tools' state, an axum HTTP listener, and a serde_json-typed request/response surface. Latency, type safety, and the ability to embed the server in a larger binary all favor Rust here.

Vector stores

Crate / projectNotes
qdrant-clientClient for the Qdrant vector DB (which is itself Rust).
lancedbEmbedded vector + retrieval DB. Lance file format, Arrow-native.
hnsw_rsPure-Rust HNSW index for in-process embeddings.

Lance is interesting because it sits in the Arrow ecosystem (so it composes with DataFusion / Sail / Polars). For small embedded use, hnsw_rs or rolling your own with ndarray is fine.

Sandboxed code execution

When the agent writes code you cannot trust to run on your machine directly:

Crate / runtimeNotes
wasmtimeEmbed a Wasm runtime in your Rust binary. Safest option for "run untrusted code."
wasmerAlternative Wasm runtime.
Firecracker microVMsHeavy. Better isolation. AWS Lambda uses this internally.
gVisor / containersHeavier still. OS-level isolation.

For agent-orchestration systems, Wasm is usually the right default. The compile-Rust-to-Wasm story is mature, the runtime is fast, and the isolation is strong without paying for a full VM.

Observability

A few things that matter specifically for agent systems:

CrateNotes
tracingAsync-aware structured logging. The default for Tokio-based code.
opentelemetry-rustExport to OTLP backends (Honeycomb, Jaeger, Tempo).
tracing-opentelemetryBridge tracing spans to OTLP.

For agent traces specifically, the pattern is: one tracing span per prompt, child spans per tool call, attributes for model, tokens, latency. This composes naturally with tracing and ships to any OTLP-compatible backend.

Sail uses this exact pattern in sail-telemetry. Borrow the structure.

Orchestration

This is the layer that is most up for grabs. "Workflow over LLM calls" is being reinvented constantly. Today the production-ready Rust options are:

  • Roll your own (it is genuinely not that much code, and the semantics are project-specific).
  • Use a generic actor / state-machine library (e.g. stateright) and build conversation routing on top.
  • Embed a workflow engine like Temporal via its Rust SDK if you need durability.

If you are designing this from scratch, the durable-state, typed-message-passing, retry-on-failure parts of the design are exactly what Rust traits and enums are good for. See Sail's sail-server::actor module for a small, real example.

What to look for as an orchestrator

When an agent writes infrastructure-shaped Rust:

PatternWatch for
tokio::spawn with Arc<Mutex<State>>Re-evaluate whether the sharing is necessary. Often a channel-based actor is cleaner.
Error types that escape the layerVector store errors should not leak into the orchestrator's error type.
unwrap() on JSON-parsed user inputAgents do this. Wrap with ? and a real error variant.
Synchronous calls inside async handlersUse spawn_blocking for CPU work, or restructure.
Custom protocol implementationsIf MCP fits, use it. Custom protocols are debt.

The agent's job is to keep your infrastructure layer small and typed. Your job is to push back when it sprawls.