Trait design in Sail

The most important trait in Sail is CatalogProvider. Every storage backend implements it: in-memory, Hive Metastore, AWS Glue, Iceberg, OneLake, Unity Catalog. The query engine treats them all uniformly. Reading this one trait teaches you most of how production Rust uses traits as extension points.

The trait

crates/sail-catalog/src/provider/mod.rs
/// A trait that defines the interface for a catalog.
/// A catalog contains *databases*, where each database has a multi-level name
/// that represents a *namespace*. A database contains *objects* such as
/// *tables* and *views*.
#[async_trait::async_trait]
pub trait CatalogProvider: Send + Sync {
    /// The name of the catalog in the session.
    fn get_name(&self) -> &str;
 
    async fn create_database(
        &self,
        database: &Namespace,
        options: CreateDatabaseOptions,
    ) -> CatalogResult<DatabaseStatus>;
 
    async fn get_database(
        &self,
        database: &Namespace,
    ) -> CatalogResult<DatabaseStatus>;
 
    async fn list_databases(
        &self,
        prefix: Option<&Namespace>,
    ) -> CatalogResult<Vec<DatabaseStatus>>;
 
    async fn drop_database(
        &self,
        database: &Namespace,
        options: DropDatabaseOptions,
    ) -> CatalogResult<()>;
 
    async fn create_table(
        &self,
        database: &Namespace,
        table: &str,
        options: CreateTableOptions,
    ) -> CatalogResult<TableStatus>;
 
    // ... more methods
}

The four ingredients that make this trait "object-safe and shareable across an async runtime":

  1. #[async_trait::async_trait] — desugars async fn in traits into Pin<Box<dyn Future + Send>> returns. Stable Rust now supports native async-fn-in-trait (AFIT), but it does not yet cover dynamic dispatch (dyn Trait) for all cases. Until then, the macro is the production choice.
  2. Send + Sync as supertraits. Required to share the trait object across Tokio tasks safely.
  3. Every method takes &self (never self or &mut self). Mutable state goes behind an Arc<Mutex<...>> or dashmap inside the implementor.
  4. Every method returns CatalogResult<T> (the local error alias). Errors flow up consistently.

A real implementor

The in-memory catalog provider, which exists for tests and for in-process Sail deployments:

crates/sail-catalog-memory/src/provider.rs
#[async_trait::async_trait]
impl CatalogProvider for MemoryCatalogProvider {
    fn get_name(&self) -> &str {
        &self.name
    }
 
    async fn create_database(
        &self,
        database: &Namespace,
        options: CreateDatabaseOptions,
    ) -> CatalogResult<DatabaseStatus> {
        let CreateDatabaseOptions {
            if_not_exists,
            comment,
            location,
            properties,
        } = options;
        let entry = self.databases.entry(database.clone());
        match entry {
            Entry::Occupied(entry) => {
                if if_not_exists {
                    Ok(entry.get().status.clone())
                } else {
                    Err(CatalogError::AlreadyExists(
                        CatalogObject::Database,
                        quote_namespace_if_needed(database),
                    ))
                }
            }
            Entry::Vacant(entry) => {
                let status = DatabaseStatus {
                    catalog: self.name.clone(),
                    database: database.clone().into(),
                    comment,
                    location,
                    properties,
                };
                let db = MemoryDatabase {
                    status: status.clone(),
                    tables: HashMap::new(),
                    views: HashMap::new(),
                };
                entry.insert(db);
                Ok(status)
            }
        }
    }
 
    // ... other methods
}

Three patterns worth noticing:

  • Struct destructuring in let: let CreateDatabaseOptions { if_not_exists, comment, location, properties } = options; unpacks all the fields at once. Cleaner than options.comment, options.location, etc.
  • dashmap::Entry for atomic "get-or-insert" without two lookups. The Entry::Occupied / Entry::Vacant match is exhaustive and avoids a TOCTOU race.
  • if_not_exists semantics in the match rather than an if cascade. The match makes the semantics explicit.

A domain type that the trait operates on

Namespace is the multi-level name (catalog.database.subdb) the trait passes around. It is a clean piece of domain Rust:

crates/sail-catalog/src/provider/namespace.rs
use std::sync::Arc;
 
use crate::error::{CatalogError, CatalogResult};
 
/// A non-empty, multi-level name. Used to refer to a database in the catalog.
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd)]
pub struct Namespace {
    pub head: Arc<str>,
    pub tail: Vec<Arc<str>>,
}
 
impl From<Namespace> for Vec<Arc<str>> {
    fn from(namespace: Namespace) -> Self {
        let mut result = vec![namespace.head];
        result.extend(namespace.tail);
        result
    }
}
 
impl<T: Into<Arc<str>>> TryFrom<Vec<T>> for Namespace {
    type Error = CatalogError;
 
    fn try_from(value: Vec<T>) -> CatalogResult<Self> {
        let mut iter = value.into_iter().map(Into::into);
        let head = iter
            .next()
            .ok_or_else(|| CatalogError::InvalidArgument("empty namespace".to_string()))?;
        let tail = iter.collect();
        Ok(Self { head, tail })
    }
}
 
impl Namespace {
    pub fn is_child_of(&self, other: &Self) -> bool {
        self.head == other.head
            && self.tail.len() == other.tail.len() + 1
            && self.tail.iter().zip(other.tail.iter()).all(|(a, b)| a == b)
    }
 
    pub fn starts_with(&self, other: &Self) -> bool {
        self.head == other.head
            && self.tail.len() >= other.tail.len()
            && self.tail.iter().zip(other.tail.iter()).all(|(a, b)| a == b)
    }
}

The single most teachable thing in this file: Arc<str> instead of String. A Namespace is read-mostly. Copying it should be cheap. Arc<str> is a reference-counted immutable string: .clone() is a refcount bump, not a heap allocation.

If your code passes around immutable identifiers (catalog names, user IDs, file paths, content-addressed hashes), use Arc<str> instead of String. The change is mechanical, the speedup is real.

The lessons

The CatalogProvider is a model of these. When you read agent-written trait designs, hold them up against this template.