Sail architecture map
Sail is a Spark-compatible distributed query engine, written entirely in Rust, built on top of Apache DataFusion. It is open source (github.com/lakehq/sail, Apache 2.0) and is a useful real-world reference for production Rust at scale.
The workspace has ~36 crates under crates/. They group into five layers.
The five layers
The lint policy is the style guide
The most actionable signal for "how Sail code is written" is the workspace lint policy in the root Cargo.toml:
[workspace.lints.clippy]
allow_attributes = "deny"
unwrap_used = "deny"
expect_used = "deny"
panic = "deny"
dbg_macro = "deny"
todo = "deny"Translation: no .unwrap(), no .expect(), no panic!(), no todo!(), no dbg!() in production code. Errors must be propagated explicitly. Even #[allow(...)] is denied — local overrides use #[expect(...)] instead, which fails to compile if the lint stops firing.
If you take one lesson from Sail and apply it to your own Rust projects, take this one.
Patterns you will see everywhere
Once you can name these, the codebase reads quickly.
| Pattern | Where it shows up |
|---|---|
pub type XxxResult<T> = Result<T, XxxError>; | Every crate's error.rs |
XxxError is a thiserror::Error enum with #[from] conversions | Every crate's error.rs |
#[async_trait::async_trait] on object-safe async traits | Anywhere a trait is used as dyn and has async methods |
Arc<dyn Trait> for sharing trait implementations | CatalogProvider, data source, etc. |
Arc<Mutex<State>> + private fn state() -> Result<MutexGuard> accessor | CatalogManager, session state |
Arc<str> instead of String for shared immutable identifiers | Catalog names, namespace parts |
try_new(...) -> Result<Self, ...> for fallible construction | Anywhere construction can fail |
new(...) -> Self + with_X(self) -> Self for fluent infallible builders | Options structs |
dashmap::DashMap for concurrent maps | Inside actors, registries |
lib.rs and mod.rs are tiny — purely mod and pub use | Every crate |
Reading order
If you want to start somewhere small and grow outward:
crates/sail-common/src/error.rs— the canonicalthiserrorpattern.crates/sail-catalog/src/provider/namespace.rs— a clean domain type withFrom/TryFrom.crates/sail-catalog/src/provider/mod.rs— the headline extension trait.crates/sail-catalog-memory/src/provider.rs— one real implementor of that trait.crates/sail-server/src/builder.rs— production tonic gRPC server builder.crates/sail-spark-connect/src/server.rs— a real async gRPC handler.
The next four pages of this tour walk through the most teachable code in each of these areas.
What Sail is not, for honesty
- Not a full Spark replacement (Spark compatibility is verified per-feature; check the README before quoting it).
- Not a benchmark champion in every workload (verify any numbers against the latest published runs, not memory).
- Not a Rust beginner project. Sail uses generics, async traits, macros, and DataFusion abstractions liberally. The tour reads selectively.