Cinematic & Narrative Generation
A generated mission that plays well but feels empty — no mid-mission dialogue, no music shifts, no character moments, no dramatic reveals — is a mission that fails the C&C fantasy. The original Red Alert didn’t just have good missions; it had missions where Stavros called you on the radar mid-battle, where the music shifted from ambient to Hell March when the tanks rolled in, where Tanya dropped a one-liner before breaching the base. That’s the standard.
The LLM generates the full cinematic layer for each mission — not just objectives and unit placement, but the narrative moments that make a mission feel authored:
Mid-mission radar comm events:
The classic C&C moment: your radar screen flickers, a character’s face appears, they deliver intel or a dramatic line. The LLM generates these as D038 Radar Comm modules, triggered by game events:
# LLM-generated radar comm event
radar_comms:
- id: bridge_warning
trigger:
type: unit_enters_region
region: bridge_approach
faction: player
speaker: "General Stavros"
portrait: stavros_concerned
text: "Commander, our scouts report heavy armor at the bridge. Going in head-on would be suicide. There's a ford upstream — shallow enough for infantry."
audio: null # TTS if available, silent otherwise
display_mode: radar_comm # replaces radar panel
duration: 6.0 # seconds, then radar returns
- id: betrayal_reveal
trigger:
type: objective_complete
objective: capture_command_post
speaker: "Colonel Vasquez"
portrait: vasquez_smug
text: "Surprised to see me, Commander? Your General Stavros sold you out. These men now answer to me."
display_mode: radar_comm
effects:
- set_flag: vasquez_betrayal
- convert_units: # allied garrison turns hostile
region: command_post_interior
from_faction: player
to_faction: enemy
cinematic: true # brief letterbox + game pause for drama
The LLM decides when these moments should happen based on the mission’s narrative arc. A routine mission might have 1-2 comms (intel at start, debrief at end). A story-critical mission might have 5-6, including a mid-battle betrayal, a desperate plea for reinforcements, and a climactic confrontation.
In-mission branching dialogues (RPG-style choices):
Not just in intermissions — branching dialogue can happen during a mission. An NPC unit is reached, a dialogue triggers, the player makes a choice that affects the mission in real-time:
mid_mission_dialogues:
- id: prisoner_interrogation
trigger:
type: unit_enters_region
unit: tanya
region: prison_compound
pause_game: true # freezes game during dialogue
tree:
- speaker: "Captured Officer"
portrait: captured_officer
text: "I'll tell you everything — the mine locations, the patrol routes. Just let me live."
choices:
- label: "Talk. Now."
effects:
- reveal_shroud: minefield_region
- set_flag: intel_acquired
next: officer_cooperates
- label: "We don't negotiate with the enemy."
effects:
- set_flag: officer_executed
- adjust_character: { name: "Tanya", loyalty: -5 }
next: tanya_reacts
- label: "You'll come with us. Command will want to talk to you."
effects:
- spawn_unit: { type: prisoner_escort, region: prison_compound }
- add_objective: { text: "Extract the prisoner to the LZ", type: secondary }
next: extraction_added
- id: officer_cooperates
speaker: "Captured Officer"
text: "The mines are along the ridge — I'll mark them on your map. And Commander... the base commander is planning to retreat at 0400."
effects:
- add_objective: { text: "Destroy the base before 0400", type: bonus, timer: 300 }
- id: tanya_reacts
speaker: "Tanya"
portrait: tanya_cold
text: "Your call, Commander. But he might have known something useful."
These are full D038 Dialogue Editor trees — the same format a human designer would create. The LLM generates them with awareness of the mission’s objectives, characters, and narrative context. The choices have mechanical consequences — revealing shroud, adding objectives, changing timers, spawning units, adjusting character loyalty.
The LLM can also generate consequence chains — a choice in Mission 5’s dialogue affects Mission 7’s setup (via story flags). “You spared the officer in Mission 5” → in Mission 7, that officer appears as an informant. The LLM tracks these across the campaign context.
Dynamic music generation:
The LLM doesn’t compose music — it curates it. For each mission, the LLM generates a D038 Music Playlist with mood-tagged tracks selected from the game module’s soundtrack and any Workshop music packs the player has installed:
music:
mode: dynamic
tracks:
ambient:
- fogger # game module default
- workshop:cold-war-ost/frozen_fields # from Workshop music pack
combat:
- hell_march
- grinder
tension:
- radio_2
- workshop:cold-war-ost/countdown
victory:
- credits
# Scripted music cues (override dynamic system at specific moments)
scripted_cues:
- trigger: { type: timer, seconds: 0 } # mission start
track: fogger
fade_in: 3.0
- trigger: { type: objective_complete, objective: breach_wall }
track: hell_march
fade_in: 0.5 # hard cut — dramatic
- trigger: { type: flag_set, flag: vasquez_betrayal }
track: workshop:cold-war-ost/countdown
fade_in: 1.0
The LLM picks tracks that match the mission’s tone. A desperate defense mission gets tense ambient tracks and hard-hitting combat music. A stealth infiltration gets quiet ambient and reserves the intense tracks for when the alarm triggers. The scripted cues tie specific music moments to narrative beats — the betrayal hits differently when the music shifts at exactly the right moment.
Cinematic sequences:
For high-stakes moments, the LLM generates full D038 Cinematic Sequences — multi-step scripted events combining camera movement, dialogue, music, unit spawns, and letterbox:
cinematic_sequences:
- id: reinforcement_arrival
trigger:
type: objective_complete
objective: hold_position_2_min
skippable: true
steps:
- type: letterbox
enable: true
transition_time: 0.5
- type: camera_pan
from: player_base
to: beach_landing
duration: 3.0
easing: ease_in_out
- type: play_music
track: hell_march
fade_in: 0.5
- type: spawn_units
units: [medium_tank, medium_tank, medium_tank, apc, apc]
position: beach_landing
faction: player
arrival: landing_craft # visual: landing craft delivers them
- type: dialogue
speaker: "Admiral Kowalski"
portrait: kowalski_grinning
text: "The cavalry has arrived, Commander. Where do you want us?"
duration: 4.0
- type: camera_pan
to: player_base
duration: 2.0
- type: letterbox
enable: false
transition_time: 0.5
The LLM generates these for key narrative moments — not every trigger. Typical placement:
| Moment | Frequency | Example |
|---|---|---|
| Mission intro | Every mission | Camera pan across the battlefield, briefing dialogue overlay |
| Reinforcement arrival | 30-50% of missions | Camera shows troops landing/parachuting in, commander dialogue |
| Mid-mission plot twist | 20-40% of missions | Betrayal reveal, surprise enemy, intel discovery |
| Objective climax | Key objectives only | Bridge explosion, base breach, hostage rescue |
| Mission conclusion | Every mission | Victory/defeat sequence, debrief comm |
Intermission dialogue and narrative scenes:
Between missions, the LLM generates intermission screens that go beyond simple briefings:
- Branching dialogue with consequences — “General, do we reinforce the eastern front or push west?” The choice affects the next mission’s setup, available forces, or strategic position.
- Character moments — two named characters argue about strategy. The player’s choice affects their loyalty and relationship. A character whose advice is ignored too many times might defect (Campaign Event Patterns).
- Intel briefings — the player reviews intelligence gathered from the previous mission. What they focus on (or ignore) shapes the next mission’s surprises.
- Moral dilemmas — execute the prisoner or extract intel? Bomb the civilian bridge or let the enemy escape? These set story flags that ripple forward through the campaign.
The LLM generates these as D038 Intermission Screens using the Dialogue template with Choice panels. Every choice links to a story flag; every flag feeds back into the LLM’s campaign context for future mission generation.
EVA and ambient audio:
The LLM generates custom EVA notification scripts — mission-specific voice cues beyond the default “Unit lost” / “Construction complete”:
custom_eva:
- event: unit_enters_region
region: minefield_zone
text: "Warning: mines detected in this area."
priority: high
cooldown: 30 # don't repeat for 30 seconds
- event: building_captured
building: enemy_radar
text: "Enemy radar facility captured. Shroud cleared."
priority: normal
- event: timer_warning
timer: evacuation_timer
remaining: 60
text: "60 seconds until evacuation window closes."
priority: critical
The LLM also generates ambient sound zone definitions for narrative atmosphere — a mission in a forest gets wind and bird sounds; a mission in a bombed-out city gets distant gunfire and sirens.
What this means in practice:
A generated mission doesn’t just drop units on a map with objectives. A generated mission:
- Opens with a cinematic pan across the battlefield while the commander briefs you
- Plays ambient music that matches the terrain and mood
- Calls you on the radar when something important happens — a new threat, a character moment, a plot development
- Presents RPG-style dialogue choices when you reach key locations or NPCs
- Shifts the music from ambient to combat when the fighting starts
- Triggers a mid-mission cinematic when the plot twists — a betrayal, a reinforcement arrival, a bridge explosion
- Announces custom EVA warnings for mission-specific hazards
- Ends with a conclusion sequence — victory celebration or desperate evacuation
- Transitions to an intermission with character dialogue, choices, and consequences
All of it is standard D038 format. All of it is editable after generation. All of it works exactly like hand-crafted content. The LLM just writes it faster.
Generative Media Pipeline (Forward-Looking)
The sections above describe the LLM generating text: YAML definitions, Lua triggers, briefing scripts, dialogue trees. But the full C&C experience isn’t text — it’s voice-acted briefings, dynamic music, sound effects, and cutscenes. Currently, generative campaigns use existing media assets: game module sound libraries, Workshop music packs, the player’s installed voice collections. A mission briefing is text that the player reads; a radar comm event is a text bubble without voice audio.
AI-generated media — voice synthesis, music generation, sound effect creation, and a deferred optional M11 video/cutscene generation layer — is advancing rapidly. By the time IC reaches Phase 7, production-quality AI voice synthesis will be mature (it largely is already in 2025–2026), AI music generation is approaching usable quality, and AI video is on a clear trajectory. The generative media pipeline prepares for this without creating obstacles for a media-free fallback.
Core design principle: every generative media feature is a progressive enhancement. A generative campaign plays identically with or without media generation. Text briefings work. Music from the existing library works. Silent radar comms with text work. When AI media providers are available, they enhance the experience — voiced briefings, custom music, generated sound effects — but nothing depends on them.
Three tiers of generative media (from most ambitious to most conservative):
Tier 1 — Live generation during generative campaigns:
The most ambitious mode. The player is playing a generative campaign. Between missions, during the loading/intermission screen, the system generates media for the next mission in real-time. The player reads the text briefing while voice synthesis runs in the background; when ready, the briefing replays with voice. If voice generation isn’t finished in time, the text-only version is already playing — no delay.
| Media Type | Generation Window | Fallback (if not ready or unavailable) | Provider Class |
|---|---|---|---|
| Voice lines | Loading screen / intermission (~15–30s) | Text-only briefing, text bubble radar comms | Voice synthesis (ElevenLabs, local TTS, XTTS, Bark, Piper) |
| Music tracks | Pre-generated during campaign setup or between missions | Existing game module soundtrack, Workshop packs | Music generation (Suno, Udio, MusicGen, local models) or built-in ABC→MIDI→SoundFont pipeline (CPU-only, no external provider needed — see research/llm-soundtrack-generation-design.md) |
| Sound FX | Pre-generated during mission generation | Game module default sound library | Sound generation (AudioGen, Stable Audio, local models) or built-in ABC→MIDI→SoundFont / !synth parameter synthesis (CPU-only — see research/llm-soundtrack-generation-design.md, research/demoscene-synthesizer-analysis.md) |
| Cutscenes | Pre-generated between missions (longer) | Text+portrait briefing, radar comm text overlay | Video generation (deferred optional M11 — Sora class, Runway, local models) |
Architecture:
#![allow(unused)]
fn main() {
/// Trait for media generation providers. Same BYOLLM pattern as LlmProvider.
/// Each media type has its own trait — providers are specialized.
pub trait VoiceProvider: Send + Sync {
/// Generate speech audio from text + voice profile.
/// Returns audio data in a standard format (WAV/OGG).
fn synthesize(
&self,
text: &str,
voice_profile: &VoiceProfile,
options: &VoiceSynthesisOptions,
) -> Result<AudioData>;
}
pub trait MusicProvider: Send + Sync {
/// Generate a music track from mood/style description.
/// Returns audio data in a standard format.
fn generate_track(
&self,
description: &MusicPrompt,
duration_secs: f32,
options: &MusicGenerationOptions,
) -> Result<AudioData>;
}
pub trait SoundFxProvider: Send + Sync {
/// Generate a sound effect from description.
fn generate_sfx(
&self,
description: &str,
duration_secs: f32,
) -> Result<AudioData>;
}
pub trait VideoProvider: Send + Sync {
/// Generate a video clip from description + character portraits + context.
fn generate_video(
&self,
description: &VideoPrompt,
options: &VideoGenerationOptions,
) -> Result<VideoData>;
}
/// Voice profile for consistent character voices across a campaign.
/// Stored in campaign context alongside CharacterState.
pub struct VoiceProfile {
/// Character name — links to campaign skeleton character.
pub character_name: String,
/// Voice description for the provider (text prompt).
/// e.g., "Deep male voice, Russian accent, military authority, clipped speech."
pub voice_description: String,
/// Provider-specific voice ID (if using a cloned/preset voice).
pub voice_id: Option<String>,
/// Reference audio sample (if provider supports voice cloning from sample).
pub reference_audio: Option<AudioData>,
}
}
Voice consistency model: The most critical challenge for campaign voice generation is consistency — the same character must sound the same across 24 missions. The VoiceProfile is created during campaign skeleton generation (Step 2) and persisted in GenerativeCampaignContext. The LLM generates the voice description from the character’s personality profile (Principle #20 — a ISTJ commander sounds different from an ESTP commando). If the provider supports voice cloning from a sample, the system generates one calibration line during setup and uses that sample as the reference for all subsequent voice generation. If not, the text description must be consistent enough that the provider produces recognizably similar output.
Music mood integration: The generation pipeline already produces music playlists with mood tags (combat, tension, ambient, victory). When a MusicProvider is configured, the system can generate mission-specific tracks from these mood tags instead of selecting from existing libraries. The LLM adds mission-specific context to the music prompt: “Tense ambient track for a night infiltration mission in an Arctic setting, building to war drums when combat triggers fire.” Generated tracks are cached in the campaign save — once created, they’re standard audio files.
Tier 2 — Pre-generated campaign (full media creation upfront):
The more conservative mode. The player configures a generative campaign, clicks “Generate Campaign,” and the system creates the entire campaign — all missions, all briefings, all media — before the first mission starts. This takes longer (minutes to hours depending on provider speed and campaign length) but produces a complete, polished campaign package.
This mode is also the content creator workflow: a modder or community member generates a campaign, reviews/edits it in the SDK (D038), replaces any weak AI-generated media with hand-crafted alternatives, and publishes the polished result to the Workshop. The AI-generated media is a starting point, not a final product.
| Advantage | Trade-off |
|---|---|
| Complete before play begins | Long generation time (depends on provider) |
| All media reviewable in SDK | Higher API cost (all media generated at once) |
| Publishable to Workshop as-is | Less reactive to player choices (media pre-committed, not adaptive) |
| Can replace weak media by hand | Requires all providers configured upfront |
Generation pipeline (extends Step 2 — Campaign Skeleton):
After the campaign skeleton is generated, the media pipeline runs:
- Voice profiles — create
VoiceProfilefor each named character. If voice cloning is supported, generate calibration samples. - All mission briefings — generate voice audio for every briefing text, every radar comm event, every intermission dialogue line.
- Mission music — generate mood-appropriate tracks for each mission (or select from existing library + generate only gap-filling tracks).
- Mission-specific sound FX — generate any custom sound effects referenced in mission scripts (ambient weather, unique weapon sounds, environmental audio).
- Cutscenes (deferred optional
M11) — generate video sequences for mission intros, mid-mission cinematics, campaign intro/outro.
Each step is independently skippable — a player might configure voice synthesis but skip music generation, using the game’s built-in soundtrack. The campaign save tracks which media was generated vs. sourced from existing libraries.
Tier 3 — SDK Asset Studio integration:
This tier already exists architecturally (D040 § Layer 3 — Agentic Asset Generation) but currently covers only visual assets (sprites, palettes, terrain, chrome). The generative media pipeline extends the Asset Studio to cover audio and video:
| Capability | Asset Studio Tool | Provider Trait |
|---|---|---|
| Voice acting | Record text → generate voice → preview on timeline → adjust pitch/speed → export .ogg/.wav | VoiceProvider |
| EVA line generation | Select EVA event type → generate authoritative voice → preview in-game → export to sound library | VoiceProvider |
| Music composition | Describe mood/style → generate track → preview against gameplay footage → trim/fade → export .ogg | MusicProvider |
| Sound FX design | Describe effect → generate → preview → layer with existing FX → export .wav | SoundFxProvider |
| Cutscene creation | Write script → generate video → preview in briefing player → edit → export .mp4/.webm | VideoProvider |
| Voice pack creation | Define character → generate all voice lines → organize → preview → publish as Workshop voice pack | VoiceProvider |
This is the modder-facing tooling. A modder creating a total conversion can generate an entire voice pack for their custom EVA, unit voice lines for new unit types, ambient music that matches their mod’s theme, and briefing videos — all within the SDK, using the same BYOLLM infrastructure.
Crate boundaries:
ic-llm— implements all provider traits (VoiceProvider,MusicProvider,SoundFxProvider,VideoProvider). Routes to configured providers via D047 task routing. Handles API communication, format conversion, caching.ic-editor(SDK) — defines the provider traits (same pattern asAssetGenerator). Provides UI for media preview, editing, and export. Tier 3 tools live here.ic-game— wires providers at startup. In generative campaign mode, triggers Tier 1 generation during loading/intermission. Plays generated media through standardic-audioand video playback systems.ic-audio— plays generated audio identically to pre-existing audio. No awareness of generation source.
What the AI does NOT replace:
- Professional voice acting. AI voice synthesis is serviceable for procedural content but cannot match a skilled human performance. Hand-crafted campaigns (D021) will always benefit from real voice actors. The AI-generated voice is a first draft, not a final product.
- Composed music. Frank Klepacki’s Hell March was not generated by an algorithm. AI music fills gaps and provides variety; it doesn’t replace composed soundtracks. The game module ships with a human-composed soundtrack; AI supplements it.
- Quality judgment. The modder/player decides if generated media meets their standards. The SDK shows it in context. The Workshop provides a distribution channel for polished results.
D047 integration — task routing for media providers:
The LLM Configuration Manager (D047) extends its task routing to include media generation tasks:
| Task | Provider Type | Typical Routing |
|---|---|---|
| Mission Generation | LlmProvider | Cloud API (quality) |
| Campaign Briefings | LlmProvider | Cloud API (quality) |
| Voice Synthesis | VoiceProvider | ElevenLabs / Local TTS (quality vs. speed trade-off) |
| Music Generation | MusicProvider | Suno API / Local MusicGen |
| Sound FX Generation | SoundFxProvider | AudioGen / Stable Audio |
Video/Cutscene (deferred optional M11) | VideoProvider | Cloud API (when mature) |
| Asset Generation (visual) | AssetGenerator | DALL-E / Stable Diffusion / Local |
| AI Orchestrator | LlmProvider | Local Ollama (fast) |
| Post-Match Coaching | LlmProvider | Local model (fast) |
Each media provider type is independently configurable. A player might have voice synthesis (local Piper TTS — free, fast, lower quality) but no music generation. The system adapts: generated missions get voiced briefings but use the existing soundtrack.
Phase:
- Phase 7: Voice synthesis integration (
VoiceProvidertrait, ElevenLabs/Piper/XTTS providers, voice profile system, Tier 1 live generation, Tier 2 pre-generation, Tier 3 SDK voice tools). Voice is the highest-impact media type and the most mature AI capability. - Phase 7: Music generation integration (
MusicProvidertrait, Suno/MusicGen providers, mood-to-prompt translation). Lower priority than voice — existing soundtrack provides good coverage. - Phase 7+: Sound FX generation (
SoundFxProvider). Useful but niche — game module sound libraries cover most needs. - Future: Video/cutscene generation (
VideoProvider). Depends on AI video technology maturity. The trait is defined now so the architecture is ready; implementation waits until quality meets the bar. The Asset Studio video pipeline (D040 — .mp4/.webm/.vqa conversion) provides the playback infrastructure.
Architectural note: The design deliberately separates provider traits by media type rather than using a single unified
MediaProvider. Voice, music, sound, and video providers have fundamentally different inputs, outputs, quality curves, and maturity timelines. A player may have excellent voice synthesis available but no music generation at all. Per-type traits and per-type D047 task routing enable this mix-and-match reality. The progressive enhancement principle ensures every combination works — from “no media providers” (text-only, existing assets) to “all providers configured” (fully generated multimedia campaigns).