Property-Based Testing Specifications (proptest)
Each property is a formal invariant verified across thousands of randomly generated inputs. Properties that fail produce a minimal counterexample for debugging.
| Property | Generator | Invariant Assertion | Shrink Target | Tier |
|---|---|---|---|---|
| Sim determinism | Random seed × random order sequence (up to 200 orders over 500 ticks) | Two runs with identical seed+orders produce identical state_hash() at every tick | Minimal divergent tick + minimal order sequence | T3 |
| Order validation purity | Random PlayerOrder × random SimState | validate_order() never mutates sim state (hash before == hash after) | Minimal order type that causes mutation | T3 |
| Order validation totality | Random PlayerOrder with arbitrary field values | validate_order() always returns OrderValidity — never panics, never hangs | Minimal panicking order | T3 |
| Snapshot round-trip identity | Random sim state after N random ticks | restore(snapshot(state)) produces state_hash() identical to original | Minimal divergent component | T3 |
| Delta snapshot correctness | Random sim state + random mutations | sim.apply_delta(&sim.delta_snapshot(&baseline)) on a clone restored from baseline produces state_hash() identical to current state | Minimal mutation set that breaks delta | T3 |
| Composite snapshot round-trip (GameRunner) | Random sim state + random CampaignState + random ScriptState after N ticks | GameRunner::restore_full(SimSnapshot { core, campaign, script }) produces identical state_hash(), identical campaign graph, and script VMs return same values via on_serialize() | Minimal divergent composite field (campaign flag, Lua variable) | T3 |
| Composite delta round-trip (GameRunner) | Random sim state + random campaign/script mutations across tick ranges | GameRunner::apply_full_delta(DeltaSnapshot { core, campaign, script }) on top of a restored full snapshot produces state identical to the original — verified across all three sub-states | Minimal composite delta that fails to reconstruct | T3 |
| Autosave composite fidelity | Random game state with active campaign + Lua scripts, autosave triggered | .icsave file loaded via GameRunner::restore_full() produces identical sim hash, campaign state, and script state as the game thread at the autosave tick | Minimal campaign/script state that diverges after save-load | T2 |
| Fixed-point arithmetic closure | Random FixedPoint × FixedPoint for add/sub/mul/div | Result stays within i32 range; no silent overflow; division by zero returns error | Minimal overflow pair | T3 |
| Pathfinding completeness | Random map topology × random start/end where path exists | Pathfinder always returns a path if one exists (checked against BFS ground truth) | Minimal topology where pathfinder fails | T3 |
| Pathfinding determinism | Random map × random start/end × two runs | Identical path output for identical input | Minimal map where paths diverge | T3 |
| Workshop dependency resolution termination | Random dependency graphs (1–100 packages, 0–10 deps each) | Resolver terminates within bounded time; returns valid order or error; no infinite loop | Minimal graph that causes non-termination | T3 |
| Campaign DAG validity | Random mission graphs (1–50 missions, 1–5 outcomes each) | CampaignGraph::new() accepts iff acyclic, fully reachable, no dangling refs | Minimal invalid graph accepted or valid graph rejected | T3 |
| UnitTag generation safety | Random pool operations (alloc/free sequences, 10K ops) | No two live units ever share the same UnitTag; stale tags always resolve to None | Minimal sequence producing tag collision | T3 |
| Chat scope isolation | Random chat messages × random scope assignments | ChatMessage<TeamScope> is never delivered to non-team recipients | Minimal routing violation | T2 |
| BoundedVec overflow safety | Random push/pop sequences against BoundedVec<T, N> | Length never exceeds N; push beyond N returns Err; no panic | Minimal violating sequence | T1 |
| BoundedCvar range enforcement | Random set() calls with values across full T range | get() always returns value within [min, max]; no value escapes bounds | Minimal value that escapes bounds | T1 |
| Merkle tree consistency | Random component mutations × tree rebuild | Root hash changes iff at least one leaf changed; unchanged leaves produce same hash | Minimal mutation where root hash is wrong | T3 |
| Weather schedule determinism | Random weather configurations × two sim instances | Weather state identical at every tick across instances with same seed | Minimal divergent config | T2 |
| Anti-cheat NaN pipeline guard | Random f64 sequences (incl. NaN, Inf, subnormal) fed to all anti-cheat scoring paths (EWMA, behavioral_score, TrustFactors, PopulationBaseline) | No output field is ever NaN or Inf; NaN inputs produce fail-closed sentinel values (1.0 for suspicion scores, population median for trust factors) | Minimal input that produces NaN in any output field | T3 |
| WASM timing oracle resistance | Random spatial query inputs × random fog configurations (0–100% fogged entities in query region) | ic_query_units_in_range() execution time does not vary beyond ±5% based on fogged entity count (measured over 1000 iterations per configuration; timer resolution ≥ microsecond) | Minimal fog configuration where timing variance exceeds threshold | T3 |
| Replay network isolation | Random replay file × random embedded YAML with external URLs | During SelfContained replay playback, zero network I/O syscalls are issued; all external asset references resolve to placeholder | Minimal replay content that triggers network access | T2 |
| Key rotation sequence monotonicity | Random concurrent rotation attempts × random timing | rotation_sequence_number is strictly monotonically increasing; no two rotations share a sequence number; cooldown-violating rotations are rejected except Emergency | Minimal concurrent rotation pair that violates monotonicity | T2 |
| TOFU connection policy correctness | Random key state (match/mismatch/first-connect/rotation-chain) × random match context (ranked/unranked/LAN) | Ranked rejects key mismatch without valid rotation chain; ranked first-connect requires seed list or manual trust; unranked TOFU-accepts with warning; LAN always warns; valid rotation chain updates cache | Minimal context where wrong connection policy is applied | T2 |
proptest configuration: 256 cases per property in T1/T2 (PR gate speed), 10,000 cases in T3 (nightly thoroughness). Regression files committed to repository — discovered failures are replayed in T1 forever.
API Misuse Test Matrix
Systematic tests derived from the API misuse analysis in architecture/api-misuse-defense.md. Each test verifies that a specific misuse vector is blocked by either the type system (compile-time) or runtime validation.
Compile-Time Defense Verification
These defenses do not require runtime tests. Some are enforced directly by the Rust type system (borrow checker, !Sync auto-trait); others rely on code review and monitoring to ensure invariants are not weakened by a refactor. The “Monitoring” column specifies how each defense is maintained — only defenses monitored by cargo check or clippy will produce automatic CI failures if removed.
| Defense | Mechanism | What Would Break It | Monitoring |
|---|---|---|---|
S5: ReconcilerToken prevents unauthorized corrections | _private: () field | Making field pub or adding Default derive | Code review checklist |
S8: Simulation is !Sync | Contains Bevy World (!Sync via UnsafeCell) | Adding unsafe impl Sync or replacing World with a Sync container | clippy + code review |
O6: OrderBudget unconstructible externally | _private: () field | Making inner fields pub | Code review checklist |
O7: Verified<PlayerOrder> restricted construction | pub(crate) on new_verified() | Changing to pub | Code review checklist |
O7b: StructurallyChecked<T> restricted construction | pub(crate) on new() + _private: () | Making new() pub or adding Default derive | Code review checklist |
W1: WasmTerminated has no execute() | Typestate pattern | Adding execute() to terminated state | Code review + trait audit |
W7: FsReadCapability unconstructible externally | _private: () field | Making field pub | Code review checklist |
P1: Workshop extract() requires PkgVerifying | Typestate consumes self | Adding extract() to PkgDownloading | Code review + trait audit |
C1: MissionLoading has no complete() | Typestate pattern | Adding complete() to loading state | Code review + trait audit |
| B4: Read buffer immutability | read() returns &T | Returning &mut T from read() | Code review checklist |
N7: SyncHash ≠ StateHash | Distinct newtypes, no From impl | Adding From<SyncHash> for StateHash | clippy + code review |
| M1: Chat scope branding | ChatMessage<TeamScope> ≠ ChatMessage<AllScope> | Adding From<ChatMessage<TeamScope>> for ChatMessage<AllScope> | Code review checklist |
Runtime Defense Test Specifications
Tests verifying runtime defenses against misuse vectors. Each test has a specific assertion, exact pass/fail criteria, and measurement metric.
| ID | Misuse Vector | Test Method | Exact Assertion | Measurement Metric | Tier |
|---|---|---|---|---|---|
| S1 | Future-tick orders | Call apply_tick(tick=N+2) when sim is at tick N | Debug: panics (debug_assert). Release: returns Err(SimError::TickMismatch { expected: N, got: N+2 }) | Panic in debug build; Err variant + field values in release | T1 |
| S2 | Duplicate orders in one tick | Replay with same order injected twice in one TickOrders batch | Second copy rejected by in-sim order validation (e.g., duplicate build on same cell); ValidatedOrder consumed once | Second order has no effect; sim state identical to single-order run | T2 |
| S3 | Cross-game snapshot restore | Simulation::restore() with snapshot from different seed | Returns Err(SimError::ConfigMismatch) — game_seed or map_hash don’t match | Err variant returned, sim state_hash() unchanged | T2 |
| S4 | Corrupted save file | Flip random byte in serialized .icsave payload, load via GameRunner’s file-loading layer | File-loading layer detects payload_hash mismatch, returns Err before reaching Simulation::restore() | 100 random bit-flips, 100% detection rate at file-loading layer | T3 |
| S6 | Float field in sim crate | Attempt to add f32/f64 field to any ic-sim struct | clippy::disallowed_types lint fails CI; post-deser range validation rejects out-of-bounds FixedPoint values | CI lint blocks compilation; fuzz: no panics from random bytes | T3 |
| S7 | Unknown player order | inject_orders() with non-existent PlayerId(999) | Order rejected with OrderRejectionCategory::Ownership (D012); specific variant is implementation-defined | Rejection fires; telemetry includes player ID | T1 |
| S9 | Out-of-bounds coordinates | Move order to WorldPos { x: 999999, y: 999999, z: 0 } | Order rejected with OrderRejectionCategory::Placement (D012); error includes position and map bounds | Rejection fires; position and bounds available in error | T1 |
| S10 | Divergent-baseline delta | Simulation::apply_delta() with delta whose baseline_tick/baseline_hash don’t match current state | Returns Err(SimError::BaselineMismatch); sim state unchanged | Err variant returned, sim state_hash() unchanged | T2 |
| O1 | Stale UnitTag after death | Kill unit, send attack order targeting dead unit’s tag | Order rejected with OrderRejectionCategory::Targeting (D012); error includes stale tag and current generation | Generation mismatch detected; stale tag not resolved | T1 |
| O2 | Order rate limit | Send 201 orders in one tick (budget=200) | First 200 accepted, 201st returns Err(BudgetExhausted) | Exact count: accepted=200, rejected=1 | T2 |
| O3 | Timestamp manipulation | sub_tick_time = 999999999 (far future) | Relay clamps to envelope max (e.g., 66667µs) | Clamped value ≤ tick_window_us; telemetry event fires | T2 |
| O8 | Oversized unit selection | Move order with 100 UnitTags (max=40) | Order rejected with OrderRejectionCategory::Custom (D012, game-module-defined selection cap); error includes count and max | Both count and max available in error | T1 |
| N2 | Handshake replay | Capture challenge response, replay on new connection | Connection terminated with AuthError::NonceReused | Connection drops within 100ms of replay | T2 |
| N6 | Half-open connection flood | Open 10,000 TCP connections, don’t complete handshake | All timeout within configured window (default: 5s); relay accepts new connections after cleanup | Peak memory < 50MB during flood; recovery < 1s | T3 |
| W3 | WASM memory bomb | memory.grow(65536) from WASM module | Growth denied; module receives trap; host continues | Host memory unchanged; module terminated cleanly | T3 |
| W5 | WASM infinite loop | loop {} in WASM entry point | Fuel exhausted; module trapped; host continues | Execution terminates within fuel budget; game tick completes | T3 |
| L1 | Lua string bomb | string.rep("a", 2^30) | Memory limit hit; script receives error; host continues | Host memory unchanged; script terminated | T3 |
| L2 | Lua infinite loop | while true do end | Instruction limit hit; script terminated | Script terminates within instruction budget | T3 |
| L3 | Lua system access | Call os.execute("rm -rf /") | Returns nil (function not registered) | No side effects on host filesystem | T1 |
| L5 | Lua UnitTag forgery | Script creates tag value for enemy unit, calls host API | SandboxError::OwnershipViolation { tag, caller, owner } | Error includes all three IDs | T3 |
| U1 | Stale UnitTag resolution | Alloc tag, free slot, resolve original tag | UnitPool::resolve() returns None | Generation mismatch, no panic | T1 |
| U2 | Pool exhaustion | Allocate units beyond pool capacity (2049 for RA1) | UnitPoolError::PoolExhausted after 2048th | Exact count: 2048 succeed, 2049th fails | T2 |
| F1 | Negative health YAML | health: { max: -100 } in unit definition | SchemaError::InvalidValue { field: "health.max", value: "-100", constraint: "> 0" } | Error includes file path + line number | T1 |
| F2 | Circular YAML inheritance | A inherits B inherits A | RuleLoadError::CircularInheritance { chain: "A → B → A" } | Chain string matches cycle path | T1 |
| F3 | Unknown TOML key | unknwon_feld = true in config.toml | DeserializationError::UnknownField { field: "unknwon_feld", valid: [...] } | Error lists available fields | T1 |
| A1 | Zip Slip in .oramap | Entry path ../../etc/passwd in archive | PathBoundaryError::EscapeAttempt { path, boundary } | Extract produces zero files outside boundary | T3 |
| A2 | Truncated .mix | Header claims 47 files, data for 31 | MixParseError::FileCountMismatch { declared: 47, actual: 31 } | Both counts in error | T1 |
Integration Scenario Matrix
End-to-end scenarios testing multiple systems interacting. Each scenario has explicit setup, action sequence, and verification points.
| Scenario | Systems Under Test | Setup | Action Sequence | Verification Points | Tier |
|---|---|---|---|---|---|
| Full match lifecycle | sim + net + replay | 2-player game, relay network, 5-min scenario | Lobby → loading → 1000 ticks → surrender → post-game | (1) Replay file exists, (2) replay hash matches live hash, (3) post-game stats match sim query | T2 |
| Reconnection mid-combat | sim + net + snapshot | 2-player game, combat in progress at tick 300 | P2 disconnects → 200 ticks → P2 reconnects with snapshot → 500 more ticks | (1) Snapshot accepted, (2) state hashes match after reconnect, (3) no combat resolution errors | T2 |
| Mod load with conflicts | modding + YAML + sim | Two mods overriding rifle_infantry.cost with different values | Load profile with explicit priority → start game → build rifle infantry | (1) Conflict detected and logged, (2) higher-priority mod wins, (3) cost in game matches winner, (4) fingerprint identical across clients | T3 |
| Workshop install → gameplay | Workshop + sim + modding | Package with new unit type, dependency on base content | Install package → resolve deps → load mod → start game → build new unit | (1) Deps installed in order, (2) unit definition loaded, (3) unit buildable in game, (4) unit stats match YAML | T4 |
| Campaign transition with roster | campaign + sim + snapshot | Campaign with 2 missions, transition on victory | Play M1 → win with 5 units → transition → verify roster in M2 | (1) 5 units in M2 roster, (2) health/veterancy preserved, (3) story flags accessible | T2 |
| Chat scope in multiplayer | chat + net + relay | 4-player team game (2v2) | P1 sends team chat → P1 sends all-chat → verify delivery | (1) Team chat: P1+P2 receive, P3+P4 do not, (2) all-chat: all 4 receive, (3) observer sees all-chat only | T2 |
| WASM mod with sandbox limits | WASM + sim + modding | Malicious mod attempting memory bomb + file access + infinite loop | Load mod → trigger memory.grow → trigger file access → trigger loop | (1) Memory growth denied, (2) file access denied, (3) loop terminated by fuel, (4) game continues normally | T3 |
| Desync detection → diagnosis | sim + net + Merkle tree | 2-player game, deliberate single-archetype mutation at tick 500 | Run to tick 500 → corrupt one archetype on P2 → run to tick 510 | (1) Desync detected within 10 ticks, (2) Merkle tree identifies exact archetype, (3) diagnosis payload < 1KB | T2 |
| Anti-cheat → trust score flow | sim + net + telemetry + ranking | Player with 10 clean games, then 1 flagged game | Play 10 games cleanly → play 1 game with known-cheat replay pattern | (1) Trust score starts high, (2) flagged game triggers score drop, (3) subsequent clean games recover slowly | T4 |
| Save/load during weather | sim + weather + snapshot | Game with active blizzard at tick 300 | Save at tick 300 → load → run 500 more ticks | (1) Weather state matches, (2) terrain surface conditions match, (3) state hash at tick 800 matches fresh run | T3 |
| Console dev-mode flagging | console + replay + ranking | Ranked game, player issues /god_mode | Start ranked → exec dev command → complete match → check replay + ranking | (1) Dev flag set, (2) replay metadata shows dev-mode, (3) match excluded from ranked standings | T2 |
| Foreign replay import | replay + sim + format | .orarep file from OpenRA | Import → play back via ForeignReplayPlayback → check divergence | (1) Import succeeds, (2) playback runs to completion, (3) divergences logged with tick+archetype detail | T3 |
Measurement & Metrics Framework
Every automated test produces structured output beyond pass/fail. These metrics feed into the release-readiness dashboard.
Performance Metrics (collected per benchmark run)
| Metric | Collection Method | Storage | Alert Threshold |
|---|---|---|---|
| Tick time (p50, p95, p99) | criterion statistical analysis | Benchmark history DB (SQLite) | p99 exceeds budget by >10% |
| Heap allocations per tick | Custom global allocator wrapper counting alloc calls | Per-benchmark counter | Any allocation in designated zero-alloc path |
| L1 cache miss rate | perf stat / platform performance counters | Benchmark log | > 5% in hot tick loop |
| Peak RSS during scenario | /proc/self/status sampling at 10ms intervals | Benchmark log | > 2× expected for unit count |
| Pathfinding nodes expanded | Internal counter in pathfinder | Per-benchmark metric | > 2× optimal for known map |
| Serialization throughput | Bytes/second for snapshot and replay frame writes | Benchmark log | Regression > 15% |
Correctness Metrics (collected per test suite run)
| Metric | Collection Method | Storage | Alert Threshold |
|---|---|---|---|
| Determinism violations | Hash comparison failures across repeated runs | Test result DB | Any violation is a P0 bug |
| False positive rate (anti-cheat) | flagged_clean / total_clean on labeled corpus | Corpus evaluation log | > 0.1% (V54 threshold) |
| False negative rate (anti-cheat) | missed_cheat / total_cheat on labeled corpus | Corpus evaluation log | > 5% (V54 threshold) |
| Order rejection accuracy | Correct rejection category rate across exhaustive matrix | Test result DB | < 100% is a bug (categories per D012) |
| Fuzz coverage (edge/line) | cargo-fuzz with --sanitizer=coverage | Fuzz coverage report | < 80% line coverage in target module |
| Property test case count | proptest runner statistics | Test log | < configured minimum (256 for T1, 10K for T3) |
| Snapshot round-trip state identity | state_hash() comparison: snapshot → restore → state_hash() | Test result DB | Any hash difference is a P0 bug |
Security Metrics (collected per security test suite run)
| Metric | Collection Method | Storage | Alert Threshold |
|---|---|---|---|
| Sandbox escape attempts blocked | Counter in WASM/Lua host | Security test log | Any unblocked attempt is a P0 bug |
| Path traversal attempts blocked | StrictPath rejection counter during fuzz | Fuzz log | Any unblocked traversal is a P0 bug |
| Replay tampering detection rate | Tampered frames detected / total tampered frames | Security test log | < 100% is a P0 bug |
| SCR replay attack detection rate | Replayed credentials detected / total replays | Security test log | < 100% is a P0 bug |
| Rate limit enforcement accuracy | Orders dropped when budget exhausted / orders sent beyond budget | Test log | < 100% is a bug |
| Half-open connection cleanup time | Time from flood to full recovery | Stress test log | > 5 seconds is a bug |