Trait design in Sail
The most important trait in Sail is CatalogProvider. Every storage backend implements it: in-memory, Hive Metastore, AWS Glue, Iceberg, OneLake, Unity Catalog. The query engine treats them all uniformly. Reading this one trait teaches you most of how production Rust uses traits as extension points.
The trait
/// A trait that defines the interface for a catalog.
/// A catalog contains *databases*, where each database has a multi-level name
/// that represents a *namespace*. A database contains *objects* such as
/// *tables* and *views*.
#[async_trait::async_trait]
pub trait CatalogProvider: Send + Sync {
/// The name of the catalog in the session.
fn get_name(&self) -> &str;
async fn create_database(
&self,
database: &Namespace,
options: CreateDatabaseOptions,
) -> CatalogResult<DatabaseStatus>;
async fn get_database(
&self,
database: &Namespace,
) -> CatalogResult<DatabaseStatus>;
async fn list_databases(
&self,
prefix: Option<&Namespace>,
) -> CatalogResult<Vec<DatabaseStatus>>;
async fn drop_database(
&self,
database: &Namespace,
options: DropDatabaseOptions,
) -> CatalogResult<()>;
async fn create_table(
&self,
database: &Namespace,
table: &str,
options: CreateTableOptions,
) -> CatalogResult<TableStatus>;
// ... more methods
}The four ingredients that make this trait "object-safe and shareable across an async runtime":
#[async_trait::async_trait]— desugarsasync fnin traits intoPin<Box<dyn Future + Send>>returns. Stable Rust now supports native async-fn-in-trait (AFIT), but it does not yet cover dynamic dispatch (dyn Trait) for all cases. Until then, the macro is the production choice.Send + Syncas supertraits. Required to share the trait object across Tokio tasks safely.- Every method takes
&self(neverselfor&mut self). Mutable state goes behind anArc<Mutex<...>>ordashmapinside the implementor. - Every method returns
CatalogResult<T>(the local error alias). Errors flow up consistently.
A real implementor
The in-memory catalog provider, which exists for tests and for in-process Sail deployments:
#[async_trait::async_trait]
impl CatalogProvider for MemoryCatalogProvider {
fn get_name(&self) -> &str {
&self.name
}
async fn create_database(
&self,
database: &Namespace,
options: CreateDatabaseOptions,
) -> CatalogResult<DatabaseStatus> {
let CreateDatabaseOptions {
if_not_exists,
comment,
location,
properties,
} = options;
let entry = self.databases.entry(database.clone());
match entry {
Entry::Occupied(entry) => {
if if_not_exists {
Ok(entry.get().status.clone())
} else {
Err(CatalogError::AlreadyExists(
CatalogObject::Database,
quote_namespace_if_needed(database),
))
}
}
Entry::Vacant(entry) => {
let status = DatabaseStatus {
catalog: self.name.clone(),
database: database.clone().into(),
comment,
location,
properties,
};
let db = MemoryDatabase {
status: status.clone(),
tables: HashMap::new(),
views: HashMap::new(),
};
entry.insert(db);
Ok(status)
}
}
}
// ... other methods
}Three patterns worth noticing:
- Struct destructuring in
let:let CreateDatabaseOptions { if_not_exists, comment, location, properties } = options;unpacks all the fields at once. Cleaner thanoptions.comment,options.location, etc. dashmap::Entryfor atomic "get-or-insert" without two lookups. TheEntry::Occupied/Entry::Vacantmatch is exhaustive and avoids a TOCTOU race.if_not_existssemantics in thematchrather than anifcascade. The match makes the semantics explicit.
A domain type that the trait operates on
Namespace is the multi-level name (catalog.database.subdb) the trait passes around. It is a clean piece of domain Rust:
use std::sync::Arc;
use crate::error::{CatalogError, CatalogResult};
/// A non-empty, multi-level name. Used to refer to a database in the catalog.
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd)]
pub struct Namespace {
pub head: Arc<str>,
pub tail: Vec<Arc<str>>,
}
impl From<Namespace> for Vec<Arc<str>> {
fn from(namespace: Namespace) -> Self {
let mut result = vec![namespace.head];
result.extend(namespace.tail);
result
}
}
impl<T: Into<Arc<str>>> TryFrom<Vec<T>> for Namespace {
type Error = CatalogError;
fn try_from(value: Vec<T>) -> CatalogResult<Self> {
let mut iter = value.into_iter().map(Into::into);
let head = iter
.next()
.ok_or_else(|| CatalogError::InvalidArgument("empty namespace".to_string()))?;
let tail = iter.collect();
Ok(Self { head, tail })
}
}
impl Namespace {
pub fn is_child_of(&self, other: &Self) -> bool {
self.head == other.head
&& self.tail.len() == other.tail.len() + 1
&& self.tail.iter().zip(other.tail.iter()).all(|(a, b)| a == b)
}
pub fn starts_with(&self, other: &Self) -> bool {
self.head == other.head
&& self.tail.len() >= other.tail.len()
&& self.tail.iter().zip(other.tail.iter()).all(|(a, b)| a == b)
}
}The single most teachable thing in this file: Arc<str> instead of String. A Namespace is read-mostly. Copying it should be cheap. Arc<str> is a reference-counted immutable string: .clone() is a refcount bump, not a heap allocation.
If your code passes around immutable identifiers (catalog names, user IDs, file paths, content-addressed hashes), use Arc<str> instead of String. The change is mechanical, the speedup is real.
The lessons
The CatalogProvider is a model of these. When you read agent-written trait designs, hold them up against this template.