Product Analytics — Comprehensive Client Event Taxonomy
The telemetry categories above capture what happens in the simulation (gameplay events, system timing) and on the servers (relay metrics, game lifecycle). A third domain is equally critical: how players interact with the game itself — which features are used, which are ignored, how people navigate the UI, how they play matches, and where they get confused or drop off.
This is the data that turns guessing into knowing: “42% of players never opened the career stats page,” “players who use control groups average 60% higher APM,” “the recovery phrase screen has a 60% skip rate — we should redesign the prompt,” “right-click ordering outnumbers sidebar ordering 8:1 — invest in right-click UX, not sidebar polish.”
Core principle: the game client never phones home. IC is an independent project — the client has zero dependency on any IC-hosted backend, analytics service, or telemetry endpoint. Product analytics are recorded to the local telemetry.db (same unified schema as every other component), stored locally, and stay local unless the player deliberately exports them. This matches the project’s local-first philosophy (D034, D061) and ensures IC remains fully functional with no internet connectivity whatsoever.
Design principles:
- Offline-only by design. The client contains no transmission code, no HTTP endpoints, no phone-home logic. There is no analytics backend to depend on, no infrastructure to maintain, no service to go offline.
- Player-owned data. The
telemetry.dbfile lives on the player’s machine — the same open SQLite format they can query themselves (D034). It’s their data. They can inspect it, export it, or delete it anytime. - Voluntary export for bug reports.
/analytics exportproduces a self-contained file (JSON or SQLite extract) the player can review and attach to bug reports, forum posts, GitHub issues, or community surveys. The player decides when, where, and to whom they send it. - Transparent and inspectable.
/analytics inspectshows exactly what’s recorded. No hidden fields, no device fingerprinting. Players can query the SQLite table directly. - Zero impact. The game is fully functional with analytics recording on or off. No nag screens. Recording can be disabled via
telemetry.product_analyticscvar (default: on for local recording).
What product analytics explicitly does NOT capture:
- Chat messages, player names, opponent names (no PII)
- Keystroke logging, raw mouse coordinates, screen captures
- Hardware identifiers, MAC addresses, IP addresses
- Filesystem contents, installed software, browser history
GUI Interaction Events
These events capture how the player navigates the interface — which screens they visit, which buttons they click, which features they discover, and where they spend their time. This is the primary source for UX insights.
| Event | JSON data Fields | What It Reveals |
|---|---|---|
gui.screen.open | screen_id, from_screen, method (button/hotkey/back/auto) | Navigation patterns — which screens do players visit? In what order? |
gui.screen.close | screen_id, duration_ms, next_screen | Time on screen — do players read the settings page for 2 seconds or 30? |
gui.click | widget_id, widget_type (button/tab/toggle/slider/list_item), screen | Which widgets get used? Which are dead space? |
gui.hotkey | key_combo, action, context_screen | Hotkey adoption — are players discovering keyboard shortcuts? |
gui.tooltip.shown | widget_id, duration_ms | Which UI elements confuse players enough to hover for a tooltip? |
gui.sidebar.interact | tab, item_id, action (select/scroll/queue/cancel), method (click/hotkey) | Sidebar usage patterns — build queue behavior, tab switching |
gui.minimap.interact | action (camera_move/ping/attack_move/rally_point), position_normalized | Minimap as input device — how often, for what? |
gui.build_placement | structure_type, outcome (placed/cancelled/invalid_position), time_to_place_ms | Build placement UX — how long does it take? How often do players cancel? |
gui.context_menu | items_shown, item_selected, screen | Right-click menu usage and discoverability |
gui.scroll | container_id, direction, distance, screen | Scroll depth — do players scroll through long lists? |
gui.panel.resize | panel_id, old_size, new_size | UI layout preferences |
gui.search | context (workshop/map_browser/settings/console), query_length, results_count | Search usage patterns — what are players looking for? |
RTS Input Events
These events capture how the player actually plays the game — selection patterns, ordering habits, control group usage, camera behavior. This is the primary source for gameplay pattern analysis and understanding how players interact with the core RTS mechanics.
| Event | JSON data Fields | What It Reveals |
|---|---|---|
input.select | unit_count, method (box_drag/click/ctrl_group/double_click/tab_cycle/select_all), unit_types[] | Selection habits — do players use box select or control groups? |
input.ctrl_group | group_number, action (assign/recall/append/steal), unit_count, unit_types[] | Control group adoption — which groups, how many units, reassignment frequency |
input.order | order_type (move/attack/attack_move/guard/patrol/stop/force_fire/deploy), target_type (ground/unit/building/none), unit_count, method (right_click/hotkey/minimap/sidebar) | How players issue orders — right-click vs. hotkey vs. sidebar? What order types dominate? |
input.build_queue | item_type, action (queue/cancel/hold/repeat), method (click/hotkey), queue_depth, queue_position | Build queue management — do players queue in advance or build-on-demand? |
input.camera | method (edge_scroll/keyboard/minimap_click/ctrl_group_recall/base_hotkey/zoom_scroll/zoom_keyboard/zoom_pinch), distance, duration_ms, zoom_level | Camera control habits — which method dominates? How far do players scroll? What zoom levels are preferred? |
input.rally_point | building_type, position_type (ground/unit/building), distance_from_building | Rally point usage and placement patterns |
input.waypoint | waypoint_count, order_type, total_distance | Shift-queue / waypoint usage frequency and complexity |
Match Flow Events
These capture the lifecycle and pacing of matches — when they start, how they progress, why they end. The match.pace snapshot emitted periodically is particularly powerful: it creates a time-series of the player’s economic and military state, enabling pace analysis, build order reconstruction, and difficulty curve assessment.
| Event | JSON data Fields | What It Reveals |
|---|---|---|
match.start | mode, map, player_count, ai_count, ai_difficulty, balance_preset, render_mode, game_module, mod_profile_fingerprint | What people play — which modes, maps, mods, settings |
match.pace | Emitted every 60s: tick, apm, credits, power_balance, unit_count, army_value, tech_tier, buildings_count, harvesters_active | Economic/military time-series — pacing, build order tendencies, when players peak |
match.end | duration_s, outcome (win/loss/draw/disconnect/surrender), units_built, units_lost, credits_harvested, credits_spent, peak_army_value, peak_unit_count | Win/loss context, game length, economic efficiency |
match.first_build | structure_type, time_s | Build order opening — first building timing (balance indicator) |
match.first_combat | time_s, attacker_units, defender_units, outcome | When does first blood happen? (game pacing metric) |
match.surrender_point | time_s, army_value_ratio, tech_tier_diff, credits_diff | At what resource/army deficit do players give up? |
match.pause | reason (player/desync/lag_stall), duration_s | Pause frequency — desync vs. deliberate pauses |
Post-Play Feedback & Content Evaluation Events (Workshop / Modes / Campaigns)
These events measure whether IC’s post-game / post-session feedback prompts are useful without becoming spam. They support UX tuning and creator-tooling iteration, but they are not moderation verdicts and they do not carry gameplay rewards.
| Event | JSON data Fields | What It Reveals |
|---|---|---|
feedback.prompt.shown | surface (post_game/campaign_end/workshop_detail), target_kind (match_mode/workshop_resource/campaign), target_id (optional), session_number, sampling_reason | Prompt frequency and where feedback is requested |
feedback.prompt.action | surface, target_kind, action (submitted/skipped/snoozed/disabled_for_target/disabled_global), time_on_prompt_ms | Whether the prompt is helpful or intrusive |
feedback.review.submit | target_kind, target_id, rating (optional 1-5), text_length, playtime_s, community_submit (bool), contains_spoiler_opt_in (bool) | Review quality and submission patterns across modes/mods/campaigns |
feedback.review.helpful_mark | resource_id, review_id, actor_role (author/moderator), outcome (marked/unmarked/rejected), reward_granted (bool), reward_type (badge/title/acknowledgement/reputation/points/none) | Creator triage behavior and helpful-review recognition usage |
feedback.review.reward_grant | review_id, resource_id, reward_type, recipient_scope (local_profile/community_profile), revocable (bool), points_amount (optional) | How often profile-only rewards are granted and what types are used |
feedback.review.reward_redeem | reward_catalog_id, cost_points, recipient_scope, outcome (success/rejected/revoked/refunded), reason | Cosmetic/profile reward redemption usage and abuse/policy tuning (if enabled) |
Privacy / reward boundary (normative):
- These are product/community UX analytics events, not ranked, matchmaking, or anti-cheat signals.
helpful_markand reward events must never imply gameplay advantages (no credits, ranking bonuses, unlock power, or competitive matchmaking weight).- Review text itself remains under Workshop/community review storage rules (D049/D037). D031 records event metadata for UX/ops tuning, not a second copy of user text by default.
Campaign Progress Events (D021, Local-First)
Campaign telemetry supports local campaign dashboards, branching progress summaries, and (if the player opts in) community benchmark aggregates. These events are social/analytics-facing, not ranked or anti-cheat signals.
| Event | JSON data Fields | What It Reveals |
|---|---|---|
campaign.run.start | campaign_id, campaign_version, game_module, difficulty, balance_preset, save_slot, continued | Which campaigns are being played and under what ruleset |
campaign.node.complete | campaign_id, mission_id, outcome, path_depth, time_s, units_lost, score, branch_revealed_count | Mission outcomes, pacing, branching progress, friction points |
campaign.progress_snapshot | campaign_id, campaign_version, unique_completed, total_missions, current_path_depth, best_path_depth, endings_unlocked, time_played_s | Branching-safe progress metrics for campaign browser/profile/dashboard UIs |
campaign.run.end | campaign_id, reason (completed/abandoned/defeat_branch/pause_for_later), best_path_depth, unique_completed, ending_id (optional), session_time_s | Campaign completion/abandonment rates and session outcomes |
Privacy / sharing boundary (normative):
- These events are always available for local dashboards (campaign browser, profile campaign card, career stats).
- Upload/export for community benchmark comparisons is opt-in and should default to aggregated summaries (
campaign.progress_snapshot) rather than full mission-by-mission histories. - Community comparisons must be normalized by campaign version + difficulty + balance preset and presented with spoiler-safe UI defaults (D021/D053).
Session & Lifecycle Events
| Event | JSON data Fields | What It Reveals |
|---|---|---|
session.start | engine_version, os, display_resolution, game_module, mod_profile_fingerprint, session_number (incrementing per install) | Environment context — OS distribution, screen sizes, how many times they’ve launched |
session.mod_manifest | game_module, mod_profile_fingerprint, unit_types[], building_types[], weapon_types[], resource_types[], faction_names[], mod_sources[] | Self-describing type vocabulary — makes exported telemetry interpretable without the mod’s YAML files |
session.profile_switch | old_fingerprint, new_fingerprint, old_game_module, new_game_module, profile_name | Mid-session mod profile changes — boundary marker for analytics segmentation |
session.end | duration_s, reason (quit/crash/update/system_sleep), screens_visited[], matches_played, features_used[] | Session shape — how long, what did they do, clean exit or crash? |
session.idle | screen_id, duration_s | Idle detection — was the player AFK on the main menu for 20 minutes? |
session.mod_manifest rationale: When telemetry records unit_type: "HARV" or weapon: "Vulcan", these strings are meaningful only if you know the mod’s type catalog. Without context, exported telemetry.db files require the original mod’s YAML files to interpret event payloads. The session.mod_manifest event, emitted once per session (and again on session.profile_switch), captures the active mod’s full type vocabulary — every unit, building, weapon, resource, and faction name defined in the loaded YAML rules. This makes exported telemetry self-describing: an analyst receiving a community-submitted telemetry.db can identify what "HARV" means without installing the mod. The manifest is typically 2–10 KB of JSON — negligible overhead for one event per session.
Settings & Configuration Events
| Event | JSON data Fields | What It Reveals |
|---|---|---|
settings.changed | setting_path, old_value, new_value, screen | Which defaults are wrong? What do players immediately change? |
settings.preset | preset_type (balance/theme/qol/render/experience), preset_name | Preset popularity — Classic vs. Remastered vs. Modern |
settings.mod_profile | action (activate/create/delete/import/export), profile_name, mod_count | Mod profile adoption and management patterns |
settings.keybind | action, old_key, new_key | Which keybinds do players remap? (ergonomics insight) |
Onboarding Events
| Event | JSON data Fields | What It Reveals |
|---|---|---|
onboarding.step | step_id, step_name, action (completed/skipped/abandoned), time_on_step_s | Where do new players drop off? Is the flow too long? |
onboarding.tutorial | tutorial_id, progress_pct, completed, time_spent_s, deaths | Tutorial completion and difficulty |
onboarding.first_use | feature_id, session_number, time_since_install_s | Feature discovery timeline — when do players first find the console? Career stats? Workshop? |
onboarding.recovery_phrase | action (shown/written_confirmed/skipped), time_on_screen_s | Recovery phrase adoption — critical for D061 backup design |
Error & Diagnostic Events
| Event | JSON data Fields | What It Reveals |
|---|---|---|
error.crash | panic_message_hash, backtrace_hash, context (screen/system/tick) | Crash frequency, clustering by context |
error.mod_load | mod_id, error_type, file_path_hash | Which mods break? Which errors? |
error.asset | asset_path_hash, format, error_type | Asset loading failures in the wild |
error.desync | tick, expected_hash, actual_hash, divergent_system_hint | Client-side desync evidence (correlates with relay relay.desync) |
error.network | error_type, context (connect/relay/workshop/tracking) | Network failures by category |
error.ui | widget_id, error_type, screen | UI rendering/interaction bugs |
Performance Sampling Events
Emitted periodically (not every frame — sampled to avoid overhead). These answer: “Are players hitting performance problems we don’t see in development?”
| Event | JSON data Fields | Sampling Rate | What It Reveals |
|---|---|---|---|
perf.frame | p50_ms, p95_ms, p99_ms, max_ms, entity_count, draw_calls, gpu_time_ms | Every 10s | Frame time distribution — who’s struggling? |
perf.sim | p50_us, p95_us, p99_us, per-system {system: us} breakdown | Every 30s | Sim tick budget — which systems are expensive for which players? |
perf.load | what (map/mod/assets/game_launch/screen), duration_ms, size_bytes | On event | Load times — how long does game startup take on real hardware? |
perf.memory | heap_bytes, component_storage_bytes, scratch_buffer_bytes, asset_cache_bytes | Every 60s | Memory pressure on real machines |
perf.pathfinding | requests, cache_hits, cache_hit_rate, p95_compute_us | Every 30s | Pathfinding load in real matches |
Analytical Power: What Questions the Data Answers
The telemetry design above is intentionally structured for SQL queryability. Here are representative queries against the unified telemetry_events table that demonstrate the kind of insights this data enables — these queries work identically on client exports, server telemetry.db files, or aggregated community datasets:
GUI & UX Insights:
-- Which screens do players never visit?
SELECT json_extract(data, '$.screen_id') AS screen, COUNT(*) AS visits
FROM telemetry_events WHERE event = 'gui.screen.open'
GROUP BY screen ORDER BY visits ASC LIMIT 20;
-- How do players issue orders: right-click, hotkey, or sidebar?
SELECT json_extract(data, '$.method') AS method, COUNT(*) AS orders
FROM telemetry_events WHERE event = 'input.order'
GROUP BY method ORDER BY orders DESC;
-- Which settings do players change within the first session?
SELECT json_extract(data, '$.setting_path') AS setting,
json_extract(data, '$.old_value') AS default_val,
json_extract(data, '$.new_value') AS changed_to,
COUNT(*) AS changes
FROM telemetry_events e
JOIN (SELECT DISTINCT session_id FROM telemetry_events
WHERE event = 'session.start'
AND json_extract(data, '$.session_number') = 1) first
ON e.session_id = first.session_id
WHERE e.event = 'settings.changed'
GROUP BY setting ORDER BY changes DESC;
-- Control group adoption: what percentage of matches use ctrl groups?
SELECT
COUNT(DISTINCT CASE WHEN event = 'input.ctrl_group' THEN session_id END) * 100.0 /
COUNT(DISTINCT CASE WHEN event = 'match.start' THEN session_id END) AS pct_matches_with_ctrl_groups
FROM telemetry_events WHERE event IN ('input.ctrl_group', 'match.start');
Gameplay Pattern Insights:
-- Average match duration by mode and map
SELECT json_extract(data, '$.mode') AS mode,
json_extract(data, '$.map') AS map,
AVG(json_extract(data, '$.duration_s')) AS avg_duration_s,
COUNT(*) AS matches
FROM telemetry_events WHERE event = 'match.end'
GROUP BY mode, map ORDER BY matches DESC;
-- Build order openings: what do players build first?
SELECT json_extract(data, '$.structure_type') AS first_building,
COUNT(*) AS frequency,
AVG(json_extract(data, '$.time_s')) AS avg_time_s
FROM telemetry_events WHERE event = 'match.first_build'
GROUP BY first_building ORDER BY frequency DESC;
-- APM distribution across the player base
SELECT
CASE WHEN apm < 30 THEN 'casual (<30)'
WHEN apm < 80 THEN 'intermediate (30-80)'
WHEN apm < 150 THEN 'advanced (80-150)'
ELSE 'expert (150+)' END AS skill_bucket,
COUNT(*) AS snapshots
FROM (SELECT CAST(json_extract(data, '$.apm') AS INTEGER) AS apm
FROM telemetry_events WHERE event = 'match.pace')
GROUP BY skill_bucket;
-- At what deficit do players surrender?
SELECT AVG(json_extract(data, '$.army_value_ratio')) AS avg_army_ratio,
AVG(json_extract(data, '$.credits_diff')) AS avg_credit_diff,
COUNT(*) AS surrenders
FROM telemetry_events WHERE event = 'match.surrender_point';
Troubleshooting Insights:
-- Crash frequency by context (which screen/system crashes most?)
SELECT json_extract(data, '$.context') AS context,
json_extract(data, '$.backtrace_hash') AS stack,
COUNT(*) AS occurrences
FROM telemetry_events WHERE event = 'error.crash'
GROUP BY context, stack ORDER BY occurrences DESC LIMIT 20;
-- Desync correlation: which maps/mods trigger desyncs?
-- (run across aggregated relay + client exports)
SELECT json_extract(data, '$.map') AS map,
COUNT(CASE WHEN event = 'relay.desync' THEN 1 END) AS desyncs,
COUNT(CASE WHEN event = 'relay.game.end' THEN 1 END) AS total_games,
ROUND(COUNT(CASE WHEN event = 'relay.desync' THEN 1 END) * 100.0 /
NULLIF(COUNT(CASE WHEN event = 'relay.game.end' THEN 1 END), 0), 1) AS desync_pct
FROM telemetry_events
WHERE event IN ('relay.desync', 'relay.game.end')
GROUP BY map ORDER BY desync_pct DESC;
-- Performance: which players have sustained frame drops?
SELECT session_id,
AVG(json_extract(data, '$.p95_ms')) AS avg_p95_frame_ms,
MAX(json_extract(data, '$.entity_count')) AS peak_entities
FROM telemetry_events WHERE event = 'perf.frame'
GROUP BY session_id
HAVING avg_p95_frame_ms > 33.3 -- below 30 FPS sustained
ORDER BY avg_p95_frame_ms DESC;
Aggregation happens in the open, not in a backend. If the project team wants to analyze telemetry across many players (e.g., for a usability study, balance patch, or release retrospective), they ask the community to voluntarily submit exports — the same model as open-source projects collecting crash dumps on GitHub. Community members run /analytics export, review the file, and attach it. Aggregation scripts live in the repository and run locally — anyone can reproduce the analysis.
Console commands (D058) — identical on client and server:
| Command | Action |
|---|---|
/analytics status | Show recording status, event count, telemetry.db size, retention settings |
/analytics inspect [category] [--last N] | Display recent events, optionally filtered by category |
/analytics export [--from DATE] [--to DATE] [--category CAT] | Export to JSON/SQLite in <data_dir>/exports/ with optional date/category filter |
/analytics clear [--before DATE] | Delete events, optionally only before a date |
/analytics on/off | Toggle local recording (telemetry.product_analytics cvar) |
/analytics query SQL | Run ad-hoc SQL against telemetry.db (dev console only, DEV_ONLY flag) |
Architecture: Where Telemetry Lives
Primary path (always-on): local SQLite. Every component writes to its own telemetry.db. This is the ground truth. No network, no infrastructure, no dependencies.
┌─────────────────────────────────────────────────────────────────┐
│ Every component (client, relay, tracking, workshop) │
│ │
│ Instrumentation ──► telemetry.db (local SQLite) │
│ (tracing + events) ├── always written │
│ ├── /analytics inspect │
│ ├── /analytics export ──► .json file │
│ │ (voluntary: bug report, feedback) │
│ └── retention: max size / max age │
└─────────────────────────────────────────────────────────────────┘
Optional path (server operators only): OTEL export. Server operators who want real-time dashboards can enable OTEL export alongside the SQLite sink. This is a deployment choice for sophisticated operators — never a requirement.
Servers with OTEL enabled:
telemetry.db ◄── Instrumentation ──► OTEL Collector (optional)
(always) (tracing + events) │
┌──────┴──────────────────┐
│ │ │
┌──────▼──┐ ┌────▼────┐ ┌───────▼───┐
│Prometheus│ │ Jaeger │ │ Loki │
│(metrics) │ │(traces) │ │(logs) │
└──────────┘ └─────────┘ └─────┬─────┘
│
┌──────▼──────┐
│ AI Training │
│ (Parquet→ML) │
└─────────────┘
The dual-write approach means:
- Every deployment gets full telemetry in SQLite — zero setup required
- Sophisticated deployments can additionally route to Grafana/Prometheus/Jaeger for real-time dashboards
- Self-hosters can route OTEL to whatever they want (Grafana Cloud, Datadog, or just stdout)
- If the OTEL collector goes down, telemetry continues in SQLite uninterrupted — no data loss
Implementation Approach
Rust ecosystem:
tracingcrate — Bevy already uses this; add structured fields and span instrumentationopentelemetry+opentelemetry-otlpcrates — OTEL SDK for Rusttracing-opentelemetry— bridgestracingspans to OTEL tracesmetricscrate — lightweight counters/histograms, exported via OTEL
Zero-cost engine instrumentation when disabled: The telemetry feature flag gates engine-level instrumentation (per-system tick timing, GameplayEvent stream, OTEL export) behind #[cfg(feature = "telemetry")]. When disabled, all engine telemetry calls compile to no-ops. No runtime cost, no allocations, no branches. This respects invariant #5 (efficiency-first performance).
Product analytics (GUI interaction, session, settings, onboarding, errors, perf sampling) always record to SQLite — they are lightweight structured event inserts, not per-tick instrumentation. The overhead is negligible (one SQLite INSERT per user action, batched in WAL mode). Players who want to disable even this can set telemetry.product_analytics false.
Transaction batching: All SQLite INSERTs — both telemetry events and gameplay events — are explicitly batched in transactions to avoid per-INSERT fsync overhead:
| Event source | Batch strategy |
|---|---|
| Product analytics | Buffered in memory; flushed in a single BEGIN/COMMIT every 1 second or 50 events, whichever first |
| Gameplay events | Buffered per tick; flushed in a single BEGIN/COMMIT at end of tick (typically 1-20 events per tick) |
| Server telemetry | Ring buffer; flushed in a single BEGIN/COMMIT every 100 ms or 200 events, whichever first |
All writes happen on a dedicated I/O thread (or spawn_blocking task) — never on the game loop thread. The game loop thread only appends to a lock-free ring buffer; the I/O thread drains and commits. This guarantees that SQLite contention (including busy_timeout waits and WAL checkpoints) cannot cause frame drops.
Ring buffer sizing: The ring buffer must absorb all events generated during the worst-case I/O thread stall (WAL checkpoint on HDD: 200–500 ms). At peak event rates (~600 events/s during intense combat — gameplay events + telemetry + product analytics combined), a 500 ms stall generates ~300 events. Minimum ring buffer capacity: 1024 entries (3.4× headroom over worst-case). Each entry is a lightweight enum (~64–128 bytes), so the buffer occupies ~64–128 KB — negligible. If the buffer fills despite this sizing, events are dropped with a counter increment (same pattern as the replay writer’s frames_lost tracking in V45). The I/O thread logs a warning on drain if drops occurred. This is a last-resort safety net, not an expected operating condition.
Build configurations:
| Build | Engine Telemetry | Product Analytics (SQLite) | OTEL Export | Use case |
|---|---|---|---|---|
release | Off | On (local SQLite) | Off | Player-facing builds — minimal overhead |
release-telemetry | On | On (local SQLite) | Optional | Tournament servers, AI training, debugging |
debug | On | On (local SQLite) | Optional | Development — full instrumentation |
Self-Hosting Observability
Community server operators get observability for free. The docker-compose.yaml (already designed in 03-NETCODE.md) can optionally include a Grafana + Prometheus + Loki stack:
# docker-compose.observability.yaml (optional overlay)
services:
otel-collector:
image: otel/opentelemetry-collector:latest
ports:
- "4317:4317" # OTLP gRPC
prometheus:
image: prom/prometheus:latest
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000" # dashboards
loki:
image: grafana/loki:latest
Pre-built Grafana dashboards ship with the project:
- Relay Dashboard: active games, player RTT, orders/sec, desync events, suspicion scores
- Tracking Dashboard: listings, heartbeats, query rates
- Workshop Dashboard: downloads, publishes, dependency resolution times
- Engine Dashboard: tick times, entity counts, system breakdown, pathfinding stats
Alternatives considered:
- Custom metrics format (less work initially, but no ecosystem — no Grafana, no alerting, no community tooling)
- StatsD (simpler but metrics-only — no traces, no structured logs, no distributed correlation)
- No telemetry (leaves operators blind and AI training without data)
- Always-on telemetry (violates performance invariant — must be zero-cost when disabled)
Phase: Unified telemetry_events SQLite schema + /analytics console commands in Phase 2 (shared across all components from day one). Engine telemetry (per-system timing, GameplayEvent stream) in Phase 2 (sim). Product analytics (GUI interaction, session, settings, onboarding, errors, performance sampling) in Phase 3 (alongside UI chrome). Server-side SQLite telemetry recording (relay, tracking, workshop) in Phase 5 (multiplayer). Optional OTEL export layer for server operators in Phase 5. Pre-built Grafana dashboards in Phase 5. AI training pipeline in Phase 7 (LLM).
AI Training Data Export
The telemetry and gameplay event data captured above is a secondary data source for AI model training — enriching the primary source (replay files) with contextual signals not present in the replay order stream.
What telemetry adds to training data:
| Telemetry Event | Training Data Value |
|---|---|
match.pace (every 60s) | Economic time-series — credits, power, army value, tech tier over time |
input.ctrl_group | Micro skill indicator — control group adoption and reassignment frequency |
input.order (method breakdown) | Input habit fingerprint — right-click vs. hotkey vs. sidebar ordering |
match.surrender_point | Reward shaping signal — at what deficit do players perceive the game as lost |
input.camera | Attention model — what the player was looking at when making decisions |
match.first_build, match.first_combat | Build order and aggression timing markers |
Export for ML:
ic analytics export-training \
--categories match,input,campaign \
--from 2026-01-01 \
--format parquet \
--output ./telemetry_training/
Exports selected telemetry event categories as Parquet files, joined to replay metadata where available. This supplements the replay-derived training pairs (see research/ml-training-pipeline-design.md) with player behavior signals that aren’t captured in the deterministic order stream.
Privacy boundary: Training exports apply the same anonymization as /analytics export — hashed session IDs, no PII, no hardware identifiers. Players who disable product analytics (telemetry.product_analytics false) produce no telemetry-enriched training data; replay-derived training pairs remain unaffected.