Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Testing Strategy & CI/CD Pipeline

This document defines the automated testing infrastructure for Iron Curtain. Every design feature must map to at least one automated verification method. Testing is not an afterthought — it is a design constraint.

Guiding Principles

  1. Determinism is testable. If a system is deterministic (Invariant #1), its behavior can be reproduced exactly. Tests that rely on determinism are the strongest tests we have.
  2. No untested exit criteria. Every milestone exit criterion (see 18-PROJECT-TRACKER.md) must have a corresponding automated test. If a criterion cannot be tested automatically, it must be flagged as a manual review gate.
  3. CI is the automated authority. If CI fails, the code does not merge — no exceptions, no “it works on my machine.” When manual review gates exist (Principle 2), both CI and the manual gate must pass before the code is shippable.
  4. Fast feedback, thorough verification. PR gates must complete in <10 minutes. Nightly suites handle expensive verification. Weekly suites cover exhaustive/long-running scenarios.

CI/CD Pipeline Tiers

Tier 1: PR Gate (every pull request, <10 min)

Test CategoryWhat It VerifiesTool / Framework
cargo clippy --allLint compliance, disallowed_types enforcement (see coding standards)clippy
cargo testUnit tests across all cratescargo test
cargo fmt --checkFormatting consistencyrustfmt
Determinism smoke test100-tick sim with fixed seed → hash match across runscustom harness
WASM sandbox smoke testBasic WASM module load/execute/capability checkcustom harness
Lua sandbox smoke testBasic Lua script load/execute/resource-limit checkcustom harness
YAML schema validationAll game data YAML files pass schema validationcustom validator
strict-path boundaryPath boundary enforcement for all untrusted-input APIsunit tests
Build (all targets)Cross-compilation succeeds (Linux, Windows, macOS)cargo build / CI matrix
Doc link checkAll internal doc cross-references resolvemdbook build + linkcheck

Gate rule: All Tier 1 tests must pass. Merge is blocked on any failure.

Tier 2: Post-Merge (after merge to main, <30 min)

Test CategoryWhat It VerifiesTool / Framework
Integration testsCross-crate interactions (ic-sim ↔ ic-game ↔ ic-script)cargo test –features integration
Determinism full suite10,000-tick sim with 8 players, all unit types → hash matchcustom harness
Network protocol testsLobby join/leave, relay handshake, reconnection, session authcustom harness + tokio
Replay round-tripRecord game → playback → hash match with originalcustom harness
Workshop package verifyPackage build → sign → upload → download → verify chaincustom harness
Anti-cheat smoke testKnown-cheat replay → detection fires; known-clean → no flagcustom harness
Memory safety (Miri)Undefined behavior detection in unsafe blockscargo miri test

Gate rule: Failures trigger automatic revert of the merge commit and notification to the PR author.

Tier 3: Nightly (scheduled, <2 hours)

Test CategoryWhat It VerifiesTool / Framework
Fuzz testingic-cnc-content parser, YAML loader, network protocol deserializercargo-fuzz / libFuzzer
Property-based testingSim invariants hold across random order sequencesproptest
Performance benchmarksTick time, memory allocation, pathfinding cost vs budgetcriterion
Zero-allocation assertionHot-path functions allocate 0 heap bytes in steady statecustom allocator hook
Sandbox escape testsWASM module attempts all known escape vectors → all blockedcustom harness
Lua resource exhaustionstring.rep bomb, infinite loop, memory bomb → all caughtcustom harness
Desync injectionDeliberately desync one client → detection fires within N tickscustom harness
Cross-platform determinismSame scenario on Linux + Windows → identical hashCI matrix comparison
Unicode/BiDi sanitizationRTL/BiDi QA corpus (rtl-bidi-qa-corpus.md) categories A–Icustom harness
Display name validationUTS #39 confusable corpus → all impersonation attempts blockedcustom harness
Save/load round-tripSave game → load → continue 1000 ticks → hash matches fresh runcustom harness

Gate rule: Failures create high-priority issues. Regressions in performance benchmarks block the next release.

Tier 4: Weekly (scheduled, <8 hours)

Test CategoryWhat It VerifiesTool / Framework
Campaign playthroughFull campaign mission sequence completes without crash/desyncautomated playback
Extended fuzz campaigns1M+ iterations per fuzzer targetcargo-fuzz
Network simulationPacket loss, latency jitter, partition scenarioscustom harness + tc/netem
Load testing8-player game at 1000 units each → tick budget holdscustom harness
Anti-cheat model evalFull labeled replay corpus → precision/recall vs V54 thresholdscustom harness
Visual regressionKey UI screens rendered → pixel diff against baselinecustom harness + image diff
Workshop ecosystem testMod install → load → gameplay → uninstall lifecyclecustom harness
Key rotation exerciseV47 key rotation → old key rejected after grace → new key workscustom harness
P2P replay attestation4-peer game → replays cross-verified → tampering detectedcustom harness
Desync classificationInjected platform-bug desync vs cheat desync → correct classificationcustom harness

Gate rule: Failures block release candidates. Weekly results feed into release-readiness dashboard.


Sub-Pages

SectionTopicFile
Infrastructure & SubsystemsTest infrastructure requirements (harness, benchmarks, fuzz, replay corpus) + 16 subsystem test specificationstesting-infrastructure-subsystems.md
Properties, Misuse & IntegrationProperty-based testing (proptest) + API misuse test matrix + integration scenario matrix + measurement/metrics frameworktesting-properties-misuse-integration.md
Coverage & ReleaseCoverage mapping (design features to tests) + release criteria + phase rollouttesting-coverage-release.md