Testing Strategy & CI/CD Pipeline

This document defines the automated testing infrastructure for Iron Curtain. Every design feature must map to at least one automated verification method. Testing is not an afterthought — it is a design constraint.

Guiding Principles

Determinism is testable. If a system is deterministic (Invariant #1), its behavior can be reproduced exactly. Tests that rely on determinism are the strongest tests we have.
No untested exit criteria. Every milestone exit criterion (see 18-PROJECT-TRACKER.md) must have a corresponding automated test. If a criterion cannot be tested automatically, it must be flagged as a manual review gate.
CI is the automated authority. If CI fails, the code does not merge — no exceptions, no “it works on my machine.” When manual review gates exist (Principle 2), both CI and the manual gate must pass before the code is shippable.
Fast feedback, thorough verification. PR gates must complete in <10 minutes. Nightly suites handle expensive verification. Weekly suites cover exhaustive/long-running scenarios.

CI/CD Pipeline Tiers

Tier 1: PR Gate (every pull request, <10 min)

Test Category	What It Verifies	Tool / Framework
`cargo clippy --all`	Lint compliance, `disallowed_types` enforcement (see coding standards)	clippy
`cargo test`	Unit tests across all crates	cargo test
`cargo fmt --check`	Formatting consistency	rustfmt
Determinism smoke test	100-tick sim with fixed seed → hash match across runs	custom harness
WASM sandbox smoke test	Basic WASM module load/execute/capability check	custom harness
Lua sandbox smoke test	Basic Lua script load/execute/resource-limit check	custom harness
YAML schema validation	All game data YAML files pass schema validation	custom validator
`strict-path` boundary	Path boundary enforcement for all untrusted-input APIs	unit tests
Build (all targets)	Cross-compilation succeeds (Linux, Windows, macOS)	cargo build / CI matrix
Doc link check	All internal doc cross-references resolve	mdbook build + linkcheck

Gate rule: All Tier 1 tests must pass. Merge is blocked on any failure.

Tier 2: Post-Merge (after merge to main, <30 min)

Test Category	What It Verifies	Tool / Framework
Integration tests	Cross-crate interactions (ic-sim ↔ ic-game ↔ ic-script)	cargo test –features integration
Determinism full suite	10,000-tick sim with 8 players, all unit types → hash match	custom harness
Network protocol tests	Lobby join/leave, relay handshake, reconnection, session auth	custom harness + tokio
Replay round-trip	Record game → playback → hash match with original	custom harness
Workshop package verify	Package build → sign → upload → download → verify chain	custom harness
Anti-cheat smoke test	Known-cheat replay → detection fires; known-clean → no flag	custom harness
Memory safety (Miri)	Undefined behavior detection in unsafe blocks	cargo miri test

Gate rule: Failures trigger automatic revert of the merge commit and notification to the PR author.

Tier 3: Nightly (scheduled, <2 hours)

Test Category	What It Verifies	Tool / Framework
Fuzz testing	`ic-cnc-content` parser, YAML loader, network protocol deserializer	cargo-fuzz / libFuzzer
Property-based testing	Sim invariants hold across random order sequences	proptest
Performance benchmarks	Tick time, memory allocation, pathfinding cost vs budget	criterion
Zero-allocation assertion	Hot-path functions allocate 0 heap bytes in steady state	custom allocator hook
Sandbox escape tests	WASM module attempts all known escape vectors → all blocked	custom harness
Lua resource exhaustion	`string.rep` bomb, infinite loop, memory bomb → all caught	custom harness
Desync injection	Deliberately desync one client → detection fires within N ticks	custom harness
Cross-platform determinism	Same scenario on Linux + Windows → identical hash	CI matrix comparison
Unicode/BiDi sanitization	RTL/BiDi QA corpus (rtl-bidi-qa-corpus.md) categories A–I	custom harness
Display name validation	UTS #39 confusable corpus → all impersonation attempts blocked	custom harness
Save/load round-trip	Save game → load → continue 1000 ticks → hash matches fresh run	custom harness

Gate rule: Failures create high-priority issues. Regressions in performance benchmarks block the next release.

Tier 4: Weekly (scheduled, <8 hours)

Test Category	What It Verifies	Tool / Framework
Campaign playthrough	Full campaign mission sequence completes without crash/desync	automated playback
Extended fuzz campaigns	1M+ iterations per fuzzer target	cargo-fuzz
Network simulation	Packet loss, latency jitter, partition scenarios	custom harness + tc/netem
Load testing	8-player game at 1000 units each → tick budget holds	custom harness
Anti-cheat model eval	Full labeled replay corpus → precision/recall vs V54 thresholds	custom harness
Visual regression	Key UI screens rendered → pixel diff against baseline	custom harness + image diff
Workshop ecosystem test	Mod install → load → gameplay → uninstall lifecycle	custom harness
Key rotation exercise	V47 key rotation → old key rejected after grace → new key works	custom harness
P2P replay attestation	4-peer game → replays cross-verified → tampering detected	custom harness
Desync classification	Injected platform-bug desync vs cheat desync → correct classification	custom harness

Gate rule: Failures block release candidates. Weekly results feed into release-readiness dashboard.

Sub-Pages

Section	Topic	File
Infrastructure & Subsystems	Test infrastructure requirements (harness, benchmarks, fuzz, replay corpus) + 16 subsystem test specifications	testing-infrastructure-subsystems.md
Properties, Misuse & Integration	Property-based testing (proptest) + API misuse test matrix + integration scenario matrix + measurement/metrics framework	testing-properties-misuse-integration.md
Coverage & Release	Coverage mapping (design features to tests) + release criteria + phase rollout	testing-coverage-release.md

Keyboard shortcuts

Iron Curtain — Design Documentation