Where allocations happen

Every allocation has a cost in cycles. The cost itself is small (tens to hundreds of nanoseconds). The compounding cost is large: an allocation pollutes the cache, can fragment the heap, and is harder for the optimizer to see across. Most performance work in production Rust and C++ codebases is allocation-reduction work.

This page is a reference card. Recognize the pattern, know the cost, choose the cheaper one when you can.

Allocation cost of common Rust patterns

A reference table listing twelve common Rust expressions, each marked with zero, one, or two dots showing how many heap allocations it causes, plus a short note.

patternallocsnoteVec::new()freelazy; no alloc until first pushvec![1, 2, 3]one alloc for capacity 3Vec::with_capacity(100)preallocated upfrontv.push(x) (in a loop)amortized; resizes ~log N timesString::new()freelazyformat!("x = {x}")always allocatesstring.clone()heap copy of the bytesBox::new(value)value lifted to the heapArc::new(value)atomic counter + valueArc::clone(&a)freejust bumps the refcountvec.iter().sum()freefold, no intermediate Vecvec.iter().collect::<Vec<_>>()collects into a new Vec&vec[1..5]freeborrowed slice, no copy

Thirteen patterns. Green “free” means zero allocations on the hot path; amber dot means one allocation per call. The point is recognition: which line of code reaches into the allocator and which one does not.

The shapes that allocate

Three kinds of operations allocate, almost always.

Owning a growable collection. Vec<T>, String, HashMap, BTreeMap. The first .push or .insert causes the first allocation. Subsequent inserts can trigger more if the capacity isn't enough. with_capacity is the cheap escape.

Lifting a value to the heap. Box::new(x), Rc::new(x), Arc::new(x). Each allocates exactly once. The cheap operation after is sharing: Arc::clone just bumps a counter.

Producing a new owned value from an iterator. .collect::<Vec<_>>(), .collect::<String>(). Always allocates. Most of the time you can avoid the .collect by folding (.sum, .fold) or by chaining through another iterator.

The shapes that don't

Three kinds of operations do not allocate, even though they look like they might.

Borrowing. &v, &s, &v[1..5]. A reference is two words on the stack (or one for a slice). Never the heap.

Iterator chains, until you collect. v.iter().map(f).filter(p).sum() does not allocate. The chain compiles to a loop. The terminal (sum, fold, for_each) consumes without owning a new collection.

Reference counting clones. Arc::clone(&a) and Rc::clone(&a) increment a counter. The value is already on the heap. No new allocation.

A worked example

Two versions of the same function. The bad version allocates four times per call; the good version allocates zero times.

△ Bad
fn greet(name: String) -> String {
  let trimmed: String = name.trim().to_string();
  let upper: String = trimmed.to_uppercase();
  let prefix: String = format!("hello, ");
  prefix + &upper
}
◇ Good
fn greet(name: &str) -> String {
  let upper = name.trim().to_uppercase();
  format!("hello, {upper}")
}

The bad version:

  1. name: String — the caller has to clone or move ownership. Forces an allocation at the call site.
  2. trim().to_string() — converts the borrowed slice back to an owned String. Allocation.
  3. to_uppercase() — allocates a new String with the uppercased bytes. Unavoidable.
  4. format!("hello, ") — allocates a String for a literal. Pointless.
  5. prefix + &upperString::add allocates the result. Could be done in place if it had been written as prefix.push_str(&upper); prefix.

The good version: take &str so the caller doesn't have to allocate. Skip the intermediate String. Let format! allocate once for the final result.

When to care

A function called once per request, allocating five times, costs you maybe a microsecond. Probably below the noise.

A function called in a tight loop processing a million rows, allocating once per row, costs you a million allocations per query. That is real money in throughput, real volatility in tail latency, and real cache pollution.

The rule of thumb: in inner loops, hot paths, anything that processes per-record, scrutinize every allocation. In setup code, request handlers, anything called once or twice, a few extra allocations are fine.

The substrate read

On the JVM, every object is on the GC heap unless escape analysis catches it. The JIT is doing the same allocation-reduction work for you, dynamically, after warmup. A Java function that “allocates five times” might end up allocating zero times after the JIT runs.

On C++, the allocations are visible in the source the same way they are in Rust. The difference is that C++ doesn't have Arc::clone as a single primitive (you reach for shared_ptr with similar semantics) and doesn't have iterator chains that fold without allocation (you write ranged-for loops by hand).

Rust's design choice was to put every allocation in plain sight in the source. The cost is a steeper learning curve. The benefit is that you read a Rust function and know exactly which lines reach for the heap. The reference card above is the map.

Further reading