D015 — Efficiency-First Performance - Iron Curtain

D015: Performance — Efficiency-First, Not Thread-First

Decision: Performance is achieved through algorithmic efficiency, cache-friendly data layout, adaptive workload, zero allocation, and amortized computation. Multi-core scaling is a bonus layer on top, not the foundation.

Principle: The engine must run a 500-unit battle smoothly on a 2-core, 4GB machine from 2012. Multi-core machines get higher unit counts as a natural consequence of the work-stealing scheduler.

The Efficiency Pyramid (ordered by impact):

Algorithmic efficiency (flowfields, spatial hash, hierarchical pathfinding)
Cache-friendly ECS layout (hot/warm/cold component separation)
Simulation LOD (skip work that doesn’t affect the outcome)
Amortized work (stagger expensive systems across ticks)
Zero-allocation hot paths (pre-allocated scratch buffers)
Work-stealing parallelism (rayon via Bevy — bonus, not foundation)

Inspired by: Datadog Vector’s pipeline efficiency, Tokio’s work-stealing runtime. These systems are fast because they waste nothing, not because they use more hardware.

Anti-pattern rejected: “Just parallelize it” as the default answer. Parallelism without algorithmic efficiency is adding lanes to a highway with broken traffic lights.

See 10-PERFORMANCE.md for full details, targets, and implementation patterns.

Keyboard shortcuts

Iron Curtain — Design Documentation

D015: Performance — Efficiency-First, Not Thread-First