Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Product Analytics — Comprehensive Client Event Taxonomy

The telemetry categories above capture what happens in the simulation (gameplay events, system timing) and on the servers (relay metrics, game lifecycle). A third domain is equally critical: how players interact with the game itself — which features are used, which are ignored, how people navigate the UI, how they play matches, and where they get confused or drop off.

This is the data that turns guessing into knowing: “42% of players never opened the career stats page,” “players who use control groups average 60% higher APM,” “the recovery phrase screen has a 60% skip rate — we should redesign the prompt,” “right-click ordering outnumbers sidebar ordering 8:1 — invest in right-click UX, not sidebar polish.”

Core principle: the game client never phones home. IC is an independent project — the client has zero dependency on any IC-hosted backend, analytics service, or telemetry endpoint. Product analytics are recorded to the local telemetry.db (same unified schema as every other component), stored locally, and stay local unless the player deliberately exports them. This matches the project’s local-first philosophy (D034, D061) and ensures IC remains fully functional with no internet connectivity whatsoever.

Design principles:

  1. Offline-only by design. The client contains no transmission code, no HTTP endpoints, no phone-home logic. There is no analytics backend to depend on, no infrastructure to maintain, no service to go offline.
  2. Player-owned data. The telemetry.db file lives on the player’s machine — the same open SQLite format they can query themselves (D034). It’s their data. They can inspect it, export it, or delete it anytime.
  3. Voluntary export for bug reports. /analytics export produces a self-contained file (JSON or SQLite extract) the player can review and attach to bug reports, forum posts, GitHub issues, or community surveys. The player decides when, where, and to whom they send it.
  4. Transparent and inspectable. /analytics inspect shows exactly what’s recorded. No hidden fields, no device fingerprinting. Players can query the SQLite table directly.
  5. Zero impact. The game is fully functional with analytics recording on or off. No nag screens. Recording can be disabled via telemetry.product_analytics cvar (default: on for local recording).

What product analytics explicitly does NOT capture:

  • Chat messages, player names, opponent names (no PII)
  • Keystroke logging, raw mouse coordinates, screen captures
  • Hardware identifiers, MAC addresses, IP addresses
  • Filesystem contents, installed software, browser history

GUI Interaction Events

These events capture how the player navigates the interface — which screens they visit, which buttons they click, which features they discover, and where they spend their time. This is the primary source for UX insights.

EventJSON data FieldsWhat It Reveals
gui.screen.openscreen_id, from_screen, method (button/hotkey/back/auto)Navigation patterns — which screens do players visit? In what order?
gui.screen.closescreen_id, duration_ms, next_screenTime on screen — do players read the settings page for 2 seconds or 30?
gui.clickwidget_id, widget_type (button/tab/toggle/slider/list_item), screenWhich widgets get used? Which are dead space?
gui.hotkeykey_combo, action, context_screenHotkey adoption — are players discovering keyboard shortcuts?
gui.tooltip.shownwidget_id, duration_msWhich UI elements confuse players enough to hover for a tooltip?
gui.sidebar.interacttab, item_id, action (select/scroll/queue/cancel), method (click/hotkey)Sidebar usage patterns — build queue behavior, tab switching
gui.minimap.interactaction (camera_move/ping/attack_move/rally_point), position_normalizedMinimap as input device — how often, for what?
gui.build_placementstructure_type, outcome (placed/cancelled/invalid_position), time_to_place_msBuild placement UX — how long does it take? How often do players cancel?
gui.context_menuitems_shown, item_selected, screenRight-click menu usage and discoverability
gui.scrollcontainer_id, direction, distance, screenScroll depth — do players scroll through long lists?
gui.panel.resizepanel_id, old_size, new_sizeUI layout preferences
gui.searchcontext (workshop/map_browser/settings/console), query_length, results_countSearch usage patterns — what are players looking for?

RTS Input Events

These events capture how the player actually plays the game — selection patterns, ordering habits, control group usage, camera behavior. This is the primary source for gameplay pattern analysis and understanding how players interact with the core RTS mechanics.

EventJSON data FieldsWhat It Reveals
input.selectunit_count, method (box_drag/click/ctrl_group/double_click/tab_cycle/select_all), unit_types[]Selection habits — do players use box select or control groups?
input.ctrl_groupgroup_number, action (assign/recall/append/steal), unit_count, unit_types[]Control group adoption — which groups, how many units, reassignment frequency
input.orderorder_type (move/attack/attack_move/guard/patrol/stop/force_fire/deploy), target_type (ground/unit/building/none), unit_count, method (right_click/hotkey/minimap/sidebar)How players issue orders — right-click vs. hotkey vs. sidebar? What order types dominate?
input.build_queueitem_type, action (queue/cancel/hold/repeat), method (click/hotkey), queue_depth, queue_positionBuild queue management — do players queue in advance or build-on-demand?
input.cameramethod (edge_scroll/keyboard/minimap_click/ctrl_group_recall/base_hotkey/zoom_scroll/zoom_keyboard/zoom_pinch), distance, duration_ms, zoom_levelCamera control habits — which method dominates? How far do players scroll? What zoom levels are preferred?
input.rally_pointbuilding_type, position_type (ground/unit/building), distance_from_buildingRally point usage and placement patterns
input.waypointwaypoint_count, order_type, total_distanceShift-queue / waypoint usage frequency and complexity

Match Flow Events

These capture the lifecycle and pacing of matches — when they start, how they progress, why they end. The match.pace snapshot emitted periodically is particularly powerful: it creates a time-series of the player’s economic and military state, enabling pace analysis, build order reconstruction, and difficulty curve assessment.

EventJSON data FieldsWhat It Reveals
match.startmode, map, player_count, ai_count, ai_difficulty, balance_preset, render_mode, game_module, mod_profile_fingerprintWhat people play — which modes, maps, mods, settings
match.paceEmitted every 60s: tick, apm, credits, power_balance, unit_count, army_value, tech_tier, buildings_count, harvesters_activeEconomic/military time-series — pacing, build order tendencies, when players peak
match.endduration_s, outcome (win/loss/draw/disconnect/surrender), units_built, units_lost, credits_harvested, credits_spent, peak_army_value, peak_unit_countWin/loss context, game length, economic efficiency
match.first_buildstructure_type, time_sBuild order opening — first building timing (balance indicator)
match.first_combattime_s, attacker_units, defender_units, outcomeWhen does first blood happen? (game pacing metric)
match.surrender_pointtime_s, army_value_ratio, tech_tier_diff, credits_diffAt what resource/army deficit do players give up?
match.pausereason (player/desync/lag_stall), duration_sPause frequency — desync vs. deliberate pauses

Post-Play Feedback & Content Evaluation Events (Workshop / Modes / Campaigns)

These events measure whether IC’s post-game / post-session feedback prompts are useful without becoming spam. They support UX tuning and creator-tooling iteration, but they are not moderation verdicts and they do not carry gameplay rewards.

EventJSON data FieldsWhat It Reveals
feedback.prompt.shownsurface (post_game/campaign_end/workshop_detail), target_kind (match_mode/workshop_resource/campaign), target_id (optional), session_number, sampling_reasonPrompt frequency and where feedback is requested
feedback.prompt.actionsurface, target_kind, action (submitted/skipped/snoozed/disabled_for_target/disabled_global), time_on_prompt_msWhether the prompt is helpful or intrusive
feedback.review.submittarget_kind, target_id, rating (optional 1-5), text_length, playtime_s, community_submit (bool), contains_spoiler_opt_in (bool)Review quality and submission patterns across modes/mods/campaigns
feedback.review.helpful_markresource_id, review_id, actor_role (author/moderator), outcome (marked/unmarked/rejected), reward_granted (bool), reward_type (badge/title/acknowledgement/reputation/points/none)Creator triage behavior and helpful-review recognition usage
feedback.review.reward_grantreview_id, resource_id, reward_type, recipient_scope (local_profile/community_profile), revocable (bool), points_amount (optional)How often profile-only rewards are granted and what types are used
feedback.review.reward_redeemreward_catalog_id, cost_points, recipient_scope, outcome (success/rejected/revoked/refunded), reasonCosmetic/profile reward redemption usage and abuse/policy tuning (if enabled)

Privacy / reward boundary (normative):

  • These are product/community UX analytics events, not ranked, matchmaking, or anti-cheat signals.
  • helpful_mark and reward events must never imply gameplay advantages (no credits, ranking bonuses, unlock power, or competitive matchmaking weight).
  • Review text itself remains under Workshop/community review storage rules (D049/D037). D031 records event metadata for UX/ops tuning, not a second copy of user text by default.

Campaign Progress Events (D021, Local-First)

Campaign telemetry supports local campaign dashboards, branching progress summaries, and (if the player opts in) community benchmark aggregates. These events are social/analytics-facing, not ranked or anti-cheat signals.

EventJSON data FieldsWhat It Reveals
campaign.run.startcampaign_id, campaign_version, game_module, difficulty, balance_preset, save_slot, continuedWhich campaigns are being played and under what ruleset
campaign.node.completecampaign_id, mission_id, outcome, path_depth, time_s, units_lost, score, branch_revealed_countMission outcomes, pacing, branching progress, friction points
campaign.progress_snapshotcampaign_id, campaign_version, unique_completed, total_missions, current_path_depth, best_path_depth, endings_unlocked, time_played_sBranching-safe progress metrics for campaign browser/profile/dashboard UIs
campaign.run.endcampaign_id, reason (completed/abandoned/defeat_branch/pause_for_later), best_path_depth, unique_completed, ending_id (optional), session_time_sCampaign completion/abandonment rates and session outcomes

Privacy / sharing boundary (normative):

  • These events are always available for local dashboards (campaign browser, profile campaign card, career stats).
  • Upload/export for community benchmark comparisons is opt-in and should default to aggregated summaries (campaign.progress_snapshot) rather than full mission-by-mission histories.
  • Community comparisons must be normalized by campaign version + difficulty + balance preset and presented with spoiler-safe UI defaults (D021/D053).

Session & Lifecycle Events

EventJSON data FieldsWhat It Reveals
session.startengine_version, os, display_resolution, game_module, mod_profile_fingerprint, session_number (incrementing per install)Environment context — OS distribution, screen sizes, how many times they’ve launched
session.mod_manifestgame_module, mod_profile_fingerprint, unit_types[], building_types[], weapon_types[], resource_types[], faction_names[], mod_sources[]Self-describing type vocabulary — makes exported telemetry interpretable without the mod’s YAML files
session.profile_switchold_fingerprint, new_fingerprint, old_game_module, new_game_module, profile_nameMid-session mod profile changes — boundary marker for analytics segmentation
session.endduration_s, reason (quit/crash/update/system_sleep), screens_visited[], matches_played, features_used[]Session shape — how long, what did they do, clean exit or crash?
session.idlescreen_id, duration_sIdle detection — was the player AFK on the main menu for 20 minutes?

session.mod_manifest rationale: When telemetry records unit_type: "HARV" or weapon: "Vulcan", these strings are meaningful only if you know the mod’s type catalog. Without context, exported telemetry.db files require the original mod’s YAML files to interpret event payloads. The session.mod_manifest event, emitted once per session (and again on session.profile_switch), captures the active mod’s full type vocabulary — every unit, building, weapon, resource, and faction name defined in the loaded YAML rules. This makes exported telemetry self-describing: an analyst receiving a community-submitted telemetry.db can identify what "HARV" means without installing the mod. The manifest is typically 2–10 KB of JSON — negligible overhead for one event per session.

Settings & Configuration Events

EventJSON data FieldsWhat It Reveals
settings.changedsetting_path, old_value, new_value, screenWhich defaults are wrong? What do players immediately change?
settings.presetpreset_type (balance/theme/qol/render/experience), preset_namePreset popularity — Classic vs. Remastered vs. Modern
settings.mod_profileaction (activate/create/delete/import/export), profile_name, mod_countMod profile adoption and management patterns
settings.keybindaction, old_key, new_keyWhich keybinds do players remap? (ergonomics insight)

Onboarding Events

EventJSON data FieldsWhat It Reveals
onboarding.stepstep_id, step_name, action (completed/skipped/abandoned), time_on_step_sWhere do new players drop off? Is the flow too long?
onboarding.tutorialtutorial_id, progress_pct, completed, time_spent_s, deathsTutorial completion and difficulty
onboarding.first_usefeature_id, session_number, time_since_install_sFeature discovery timeline — when do players first find the console? Career stats? Workshop?
onboarding.recovery_phraseaction (shown/written_confirmed/skipped), time_on_screen_sRecovery phrase adoption — critical for D061 backup design

Error & Diagnostic Events

EventJSON data FieldsWhat It Reveals
error.crashpanic_message_hash, backtrace_hash, context (screen/system/tick)Crash frequency, clustering by context
error.mod_loadmod_id, error_type, file_path_hashWhich mods break? Which errors?
error.assetasset_path_hash, format, error_typeAsset loading failures in the wild
error.desynctick, expected_hash, actual_hash, divergent_system_hintClient-side desync evidence (correlates with relay relay.desync)
error.networkerror_type, context (connect/relay/workshop/tracking)Network failures by category
error.uiwidget_id, error_type, screenUI rendering/interaction bugs

Performance Sampling Events

Emitted periodically (not every frame — sampled to avoid overhead). These answer: “Are players hitting performance problems we don’t see in development?”

EventJSON data FieldsSampling RateWhat It Reveals
perf.framep50_ms, p95_ms, p99_ms, max_ms, entity_count, draw_calls, gpu_time_msEvery 10sFrame time distribution — who’s struggling?
perf.simp50_us, p95_us, p99_us, per-system {system: us} breakdownEvery 30sSim tick budget — which systems are expensive for which players?
perf.loadwhat (map/mod/assets/game_launch/screen), duration_ms, size_bytesOn eventLoad times — how long does game startup take on real hardware?
perf.memoryheap_bytes, component_storage_bytes, scratch_buffer_bytes, asset_cache_bytesEvery 60sMemory pressure on real machines
perf.pathfindingrequests, cache_hits, cache_hit_rate, p95_compute_usEvery 30sPathfinding load in real matches

Analytical Power: What Questions the Data Answers

The telemetry design above is intentionally structured for SQL queryability. Here are representative queries against the unified telemetry_events table that demonstrate the kind of insights this data enables — these queries work identically on client exports, server telemetry.db files, or aggregated community datasets:

GUI & UX Insights:

-- Which screens do players never visit?
SELECT json_extract(data, '$.screen_id') AS screen, COUNT(*) AS visits
FROM telemetry_events WHERE event = 'gui.screen.open'
GROUP BY screen ORDER BY visits ASC LIMIT 20;

-- How do players issue orders: right-click, hotkey, or sidebar?
SELECT json_extract(data, '$.method') AS method, COUNT(*) AS orders
FROM telemetry_events WHERE event = 'input.order'
GROUP BY method ORDER BY orders DESC;

-- Which settings do players change within the first session?
SELECT json_extract(data, '$.setting_path') AS setting,
       json_extract(data, '$.old_value') AS default_val,
       json_extract(data, '$.new_value') AS changed_to,
       COUNT(*) AS changes
FROM telemetry_events e
JOIN (SELECT DISTINCT session_id FROM telemetry_events
      WHERE event = 'session.start'
      AND json_extract(data, '$.session_number') = 1) first
  ON e.session_id = first.session_id
WHERE e.event = 'settings.changed'
GROUP BY setting ORDER BY changes DESC;

-- Control group adoption: what percentage of matches use ctrl groups?
SELECT
  COUNT(DISTINCT CASE WHEN event = 'input.ctrl_group' THEN session_id END) * 100.0 /
  COUNT(DISTINCT CASE WHEN event = 'match.start' THEN session_id END) AS pct_matches_with_ctrl_groups
FROM telemetry_events WHERE event IN ('input.ctrl_group', 'match.start');

Gameplay Pattern Insights:

-- Average match duration by mode and map
SELECT json_extract(data, '$.mode') AS mode,
       json_extract(data, '$.map') AS map,
       AVG(json_extract(data, '$.duration_s')) AS avg_duration_s,
       COUNT(*) AS matches
FROM telemetry_events WHERE event = 'match.end'
GROUP BY mode, map ORDER BY matches DESC;

-- Build order openings: what do players build first?
SELECT json_extract(data, '$.structure_type') AS first_building,
       COUNT(*) AS frequency,
       AVG(json_extract(data, '$.time_s')) AS avg_time_s
FROM telemetry_events WHERE event = 'match.first_build'
GROUP BY first_building ORDER BY frequency DESC;

-- APM distribution across the player base
SELECT
  CASE WHEN apm < 30 THEN 'casual (<30)'
       WHEN apm < 80 THEN 'intermediate (30-80)'
       WHEN apm < 150 THEN 'advanced (80-150)'
       ELSE 'expert (150+)' END AS skill_bucket,
  COUNT(*) AS snapshots
FROM (SELECT CAST(json_extract(data, '$.apm') AS INTEGER) AS apm
      FROM telemetry_events WHERE event = 'match.pace')
GROUP BY skill_bucket;

-- At what deficit do players surrender?
SELECT AVG(json_extract(data, '$.army_value_ratio')) AS avg_army_ratio,
       AVG(json_extract(data, '$.credits_diff')) AS avg_credit_diff,
       COUNT(*) AS surrenders
FROM telemetry_events WHERE event = 'match.surrender_point';

Troubleshooting Insights:

-- Crash frequency by context (which screen/system crashes most?)
SELECT json_extract(data, '$.context') AS context,
       json_extract(data, '$.backtrace_hash') AS stack,
       COUNT(*) AS occurrences
FROM telemetry_events WHERE event = 'error.crash'
GROUP BY context, stack ORDER BY occurrences DESC LIMIT 20;

-- Desync correlation: which maps/mods trigger desyncs?
-- (run across aggregated relay + client exports)
SELECT json_extract(data, '$.map') AS map,
       COUNT(CASE WHEN event = 'relay.desync' THEN 1 END) AS desyncs,
       COUNT(CASE WHEN event = 'relay.game.end' THEN 1 END) AS total_games,
       ROUND(COUNT(CASE WHEN event = 'relay.desync' THEN 1 END) * 100.0 /
             NULLIF(COUNT(CASE WHEN event = 'relay.game.end' THEN 1 END), 0), 1) AS desync_pct
FROM telemetry_events
WHERE event IN ('relay.desync', 'relay.game.end')
GROUP BY map ORDER BY desync_pct DESC;

-- Performance: which players have sustained frame drops?
SELECT session_id,
       AVG(json_extract(data, '$.p95_ms')) AS avg_p95_frame_ms,
       MAX(json_extract(data, '$.entity_count')) AS peak_entities
FROM telemetry_events WHERE event = 'perf.frame'
GROUP BY session_id
HAVING avg_p95_frame_ms > 33.3  -- below 30 FPS sustained
ORDER BY avg_p95_frame_ms DESC;

Aggregation happens in the open, not in a backend. If the project team wants to analyze telemetry across many players (e.g., for a usability study, balance patch, or release retrospective), they ask the community to voluntarily submit exports — the same model as open-source projects collecting crash dumps on GitHub. Community members run /analytics export, review the file, and attach it. Aggregation scripts live in the repository and run locally — anyone can reproduce the analysis.

Console commands (D058) — identical on client and server:

CommandAction
/analytics statusShow recording status, event count, telemetry.db size, retention settings
/analytics inspect [category] [--last N]Display recent events, optionally filtered by category
/analytics export [--from DATE] [--to DATE] [--category CAT]Export to JSON/SQLite in <data_dir>/exports/ with optional date/category filter
/analytics clear [--before DATE]Delete events, optionally only before a date
/analytics on/offToggle local recording (telemetry.product_analytics cvar)
/analytics query SQLRun ad-hoc SQL against telemetry.db (dev console only, DEV_ONLY flag)

Architecture: Where Telemetry Lives

Primary path (always-on): local SQLite. Every component writes to its own telemetry.db. This is the ground truth. No network, no infrastructure, no dependencies.

  ┌─────────────────────────────────────────────────────────────────┐
  │ Every component (client, relay, tracking, workshop)             │
  │                                                                 │
  │  Instrumentation    ──►  telemetry.db (local SQLite)            │
  │  (tracing + events)      ├── always written                     │
  │                          ├── /analytics inspect                 │
  │                          ├── /analytics export ──► .json file   │
  │                          │   (voluntary: bug report, feedback)  │
  │                          └── retention: max size / max age      │
  └─────────────────────────────────────────────────────────────────┘

Optional path (server operators only): OTEL export. Server operators who want real-time dashboards can enable OTEL export alongside the SQLite sink. This is a deployment choice for sophisticated operators — never a requirement.

  Servers with OTEL enabled:

  telemetry.db ◄── Instrumentation ──► OTEL Collector (optional)
  (always)         (tracing + events)       │
                                     ┌──────┴──────────────────┐
                                     │          │              │
                              ┌──────▼──┐ ┌────▼────┐ ┌───────▼───┐
                              │Prometheus│ │ Jaeger  │ │   Loki    │
                              │(metrics) │ │(traces) │ │(logs)     │
                              └──────────┘ └─────────┘ └─────┬─────┘
                                                             │
                                                      ┌──────▼──────┐
                                                      │ AI Training  │
                                                      │ (Parquet→ML) │
                                                      └─────────────┘

The dual-write approach means:

  • Every deployment gets full telemetry in SQLite — zero setup required
  • Sophisticated deployments can additionally route to Grafana/Prometheus/Jaeger for real-time dashboards
  • Self-hosters can route OTEL to whatever they want (Grafana Cloud, Datadog, or just stdout)
  • If the OTEL collector goes down, telemetry continues in SQLite uninterrupted — no data loss

Implementation Approach

Rust ecosystem:

  • tracing crate — Bevy already uses this; add structured fields and span instrumentation
  • opentelemetry + opentelemetry-otlp crates — OTEL SDK for Rust
  • tracing-opentelemetry — bridges tracing spans to OTEL traces
  • metrics crate — lightweight counters/histograms, exported via OTEL

Zero-cost engine instrumentation when disabled: The telemetry feature flag gates engine-level instrumentation (per-system tick timing, GameplayEvent stream, OTEL export) behind #[cfg(feature = "telemetry")]. When disabled, all engine telemetry calls compile to no-ops. No runtime cost, no allocations, no branches. This respects invariant #5 (efficiency-first performance).

Product analytics (GUI interaction, session, settings, onboarding, errors, perf sampling) always record to SQLite — they are lightweight structured event inserts, not per-tick instrumentation. The overhead is negligible (one SQLite INSERT per user action, batched in WAL mode). Players who want to disable even this can set telemetry.product_analytics false.

Transaction batching: All SQLite INSERTs — both telemetry events and gameplay events — are explicitly batched in transactions to avoid per-INSERT fsync overhead:

Event sourceBatch strategy
Product analyticsBuffered in memory; flushed in a single BEGIN/COMMIT every 1 second or 50 events, whichever first
Gameplay eventsBuffered per tick; flushed in a single BEGIN/COMMIT at end of tick (typically 1-20 events per tick)
Server telemetryRing buffer; flushed in a single BEGIN/COMMIT every 100 ms or 200 events, whichever first

All writes happen on a dedicated I/O thread (or spawn_blocking task) — never on the game loop thread. The game loop thread only appends to a lock-free ring buffer; the I/O thread drains and commits. This guarantees that SQLite contention (including busy_timeout waits and WAL checkpoints) cannot cause frame drops.

Ring buffer sizing: The ring buffer must absorb all events generated during the worst-case I/O thread stall (WAL checkpoint on HDD: 200–500 ms). At peak event rates (~600 events/s during intense combat — gameplay events + telemetry + product analytics combined), a 500 ms stall generates ~300 events. Minimum ring buffer capacity: 1024 entries (3.4× headroom over worst-case). Each entry is a lightweight enum (~64–128 bytes), so the buffer occupies ~64–128 KB — negligible. If the buffer fills despite this sizing, events are dropped with a counter increment (same pattern as the replay writer’s frames_lost tracking in V45). The I/O thread logs a warning on drain if drops occurred. This is a last-resort safety net, not an expected operating condition.

Build configurations:

BuildEngine TelemetryProduct Analytics (SQLite)OTEL ExportUse case
releaseOffOn (local SQLite)OffPlayer-facing builds — minimal overhead
release-telemetryOnOn (local SQLite)OptionalTournament servers, AI training, debugging
debugOnOn (local SQLite)OptionalDevelopment — full instrumentation

Self-Hosting Observability

Community server operators get observability for free. The docker-compose.yaml (already designed in 03-NETCODE.md) can optionally include a Grafana + Prometheus + Loki stack:

# docker-compose.observability.yaml (optional overlay)
services:
  otel-collector:
    image: otel/opentelemetry-collector:latest
    ports:
      - "4317:4317"    # OTLP gRPC
  prometheus:
    image: prom/prometheus:latest
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"    # dashboards
  loki:
    image: grafana/loki:latest

Pre-built Grafana dashboards ship with the project:

  • Relay Dashboard: active games, player RTT, orders/sec, desync events, suspicion scores
  • Tracking Dashboard: listings, heartbeats, query rates
  • Workshop Dashboard: downloads, publishes, dependency resolution times
  • Engine Dashboard: tick times, entity counts, system breakdown, pathfinding stats

Alternatives considered:

  • Custom metrics format (less work initially, but no ecosystem — no Grafana, no alerting, no community tooling)
  • StatsD (simpler but metrics-only — no traces, no structured logs, no distributed correlation)
  • No telemetry (leaves operators blind and AI training without data)
  • Always-on telemetry (violates performance invariant — must be zero-cost when disabled)

Phase: Unified telemetry_events SQLite schema + /analytics console commands in Phase 2 (shared across all components from day one). Engine telemetry (per-system timing, GameplayEvent stream) in Phase 2 (sim). Product analytics (GUI interaction, session, settings, onboarding, errors, performance sampling) in Phase 3 (alongside UI chrome). Server-side SQLite telemetry recording (relay, tracking, workshop) in Phase 5 (multiplayer). Optional OTEL export layer for server operators in Phase 5. Pre-built Grafana dashboards in Phase 5. AI training pipeline in Phase 7 (LLM).

AI Training Data Export

The telemetry and gameplay event data captured above is a secondary data source for AI model training — enriching the primary source (replay files) with contextual signals not present in the replay order stream.

What telemetry adds to training data:

Telemetry EventTraining Data Value
match.pace (every 60s)Economic time-series — credits, power, army value, tech tier over time
input.ctrl_groupMicro skill indicator — control group adoption and reassignment frequency
input.order (method breakdown)Input habit fingerprint — right-click vs. hotkey vs. sidebar ordering
match.surrender_pointReward shaping signal — at what deficit do players perceive the game as lost
input.cameraAttention model — what the player was looking at when making decisions
match.first_build, match.first_combatBuild order and aggression timing markers

Export for ML:

ic analytics export-training \
  --categories match,input,campaign \
  --from 2026-01-01 \
  --format parquet \
  --output ./telemetry_training/

Exports selected telemetry event categories as Parquet files, joined to replay metadata where available. This supplements the replay-derived training pairs (see research/ml-training-pipeline-design.md) with player behavior signals that aren’t captured in the deterministic order stream.

Privacy boundary: Training exports apply the same anonymization as /analytics export — hashed session IDs, no PII, no hardware identifiers. Players who disable product analytics (telemetry.product_analytics false) produce no telemetry-enriched training data; replay-derived training pairs remain unaffected.