What the JIT knows

Run a Java program and a Rust program that do the same thing. Say, sum a billion floats. Time the first request. The Rust program finishes first by a wide margin. Now run the same loop ten thousand times. The Java program might finish first. Both numbers are real. Each one teaches something different about the substrate underneath.

This essay is the long version of the diagram below. Three curves, throughput plotted against time. The shape of each curve is the substrate's signature.

Three computation curves

A line chart with three curves. The Rust AOT curve is high and nearly flat from the first request. The JVM JIT curve starts very low while the bytecode is interpreted, then climbs in two steps as the C1 and C2 compilers kick in around one thousand and ten thousand iterations, eventually settling slightly above the Rust line with small dips for garbage collection. The cold-native curve sits flat through the middle, representing an interpreter or a poorly tuned compiler that never improves.

cold native

Rust · AOT

JVM · JIT

C1

C2

GC

first request

1k

10k

100k

warm

iterations →

throughput →

Throughput against iterations. The Rust curve is flat and predictable. The JVM curve climbs in two visible steps (C1, then C2), with a small GC dip. Cold native never warms up.

The flat line on top is Rust. A Rust binary is fully compiled before it runs. The compiler sees every function, decides on inlining and monomorphization based on whatever it can prove statically, and emits machine code. The first request runs that machine code. The ten thousandth request runs the same machine code. The curve does not rise much, because nothing in the runtime watches and improves the program. If you want it faster, you write it differently or you upgrade the compiler.

The curve that starts low and climbs is the JVM. The bytecode runs through an interpreter on the first call. As the same code path runs again and again, HotSpot's tier-one compiler (C1) emits quick machine code with simple optimizations. Then if the path keeps getting hot, tier-two (C2) recompiles it with aggressive optimizations: inlining across virtual calls, escape analysis to stack-allocate objects, branch prediction shaped by real history. By the ten thousandth iteration the JVM is often running better-tuned code than the AOT compiler could produce, because the JIT had information the AOT compiler never had. Which branches were actually taken. Which types actually showed up. Which calls actually dispatched where.

The flat line through the middle is cold native. Interpreted Python without a JIT. A naive bytecode VM. A compiler that compiles but skips the aggressive optimizations. The line is roughly where most languages were before HotSpot existed. It is where Python still lives for pure-Python loops. The point of having it on the chart is to remember that "compiled" is a spectrum, and the JVM's runtime tier is real engineering work.

Why the JIT can win

The JVM is better at runtime than its reputation. Three optimizations do most of that work.

Escape analysis proves that an object never leaves the current scope, so it can be stack-allocated instead of going to the GC heap. The hot loop pays no allocation cost.

Inlining across virtual calls. In Java, methods are virtual by default; calling one normally needs a dispatch table lookup. HotSpot watches which subclass shows up at a given call site and inlines that subclass's method directly. If the assumption ever breaks, the optimization rolls back. The steady-state cost is zero.

Profile-guided branch prediction. The JIT sees which side of each if was taken in the last million iterations and emits code that assumes that side. Wrong guesses are recompiled. AOT compilers can also do this if you feed them a profile, but most builds don't carry one.

The honest version of "Java is faster than you think" is this: on a tight, hot loop that runs for hours, with a recent JVM and modest GC tuning, the JVM matches or beats naive AOT code. This has been measurable since around HotSpot 1.4 and remains true today.

Why AOT wins everywhere else

The first request matters. A web service that handles one request per cold start, a CLI tool that runs and exits, a Lambda that warms up for a hundred milliseconds. None of these reach steady state. The JVM curve barely starts rising before the program ends. AOT gives you peak performance at request one.

Tail latency matters. The JIT pauses periodically. Garbage collection. Deoptimization. Recompilation. A p50 might be excellent on the JVM; the p99 is where the curve dips. For a database hot path or a packet routing layer, the dip is the system. Rust's curve does not have that dip.

Deployment artifacts matter. A Rust binary is one file. A JVM deployment is the JVM plus the classpath plus configuration plus a JIT memory budget. On a 50MB embedded target or a 100ms-cold-start serverless platform, the JVM is not in the running.

Predictability matters. You can read a Rust binary and reason about what it will do. The JVM's behavior at hour twelve depends on what the JIT decided at hour two, which depended on which paths were hot at hour zero. Diagnosing a slowdown across that history is real work.

What this means for orchestrators

When an agent claims "we got a 3x speedup by rewriting from Java to Rust," ask which curve they measured. A cold-start microbenchmark will always show Rust ahead. A long-running steady-state batch job might genuinely be faster on a tuned JVM, and that is the honest answer.

The slogan "Rust is faster than Java" is mostly wrong. The accurate slogan is "Rust is predictable in ways the JVM cannot be without warmup, GC tuning, and a deep operations team."

For services where the tail matters more than the peak, that predictability is the whole story.

Further reading