P2P Distribution (BitTorrent/WebTorrent)
Wire protocol specification: For the byte-level BT wire protocol, piece picker algorithm, choking strategy, authenticated announce, WebRTC signaling,
icpkgbinary header, and DHT design, seeresearch/p2p-engine-protocol-design.md.
P2P piece mapping: The complete BitTorrent piece mapping for .icpkg packages — piece size, chunking, manifest-only fetch, CAS/BT interaction — is specified in
research/p2p-engine-protocol-design.md§ 10.
The cost problem: A popular 500MB mod downloaded 10,000 times generates 5TB of egress. At CDN rates ($0.01–0.09/GB), that’s $50–450/month — per mod. For a community project sustained by donations, centralized hosting is financially unsustainable at scale. A BitTorrent tracker VPS costs $5–20/month regardless of popularity.
The solution: Workshop distribution uses the BitTorrent protocol for large packages, with HTTP as both a concurrent transport (via BEP 17/19 web seeding) and a last-resort fallback. When web seed URLs are present in torrent metadata, HTTP mirrors participate simultaneously alongside BT peers in the piece scheduler — downloads aggregate bandwidth from both transports. The Workshop server acts as both metadata registry (SQLite, lightweight) and BitTorrent tracker (peer coordination, lightweight). See D049-web-seeding.md for the full web seeding design.
How it works:
┌─────────────┐ 1. Search/browse ┌──────────────────┐
│ ic CLI / │ ───────────────────────► │ Workshop Server │
│ In-Game │ ◄─────────────────────── │ (metadata + │
│ Browser │ 2. manifest.yaml + │ tracker) │
│ │ torrent info │ │
│ │ └──────────────────┘
│ │ 3. P2P download
│ │ ◄──────────────────────► Other players (peers/seeds)
│ │ (BitTorrent protocol)
│ │
│ │ 4. Web seeding (BEP 17/19 concurrent HTTP) + fallback
│ │ ◄─────────────────────── Workshop server / mirrors / seed box
└─────────────┘ 5. Verify SHA-256
- Publish:
ic mod publishuploads .icpkg to Workshop server. Server computes SHA-256, generates torrent metadata (info hash), starts seeding the package alongside any initial seed infrastructure. - Browse/Search: Workshop server handles all metadata queries (search, dependency resolution, ratings) via the existing SQLite + FTS5 design. Lightweight.
- Install:
ic mod installfetches the manifest from the server, then downloads the .icpkg via BitTorrent + HTTP concurrently (when web seed URLs are present in torrent metadata). If no BT peers are available and no web seeds exist, falls back to HTTP direct download as a last resort. - Seed: Players who have downloaded a package automatically seed it to others (opt-out in settings). The more popular a resource, the faster it downloads — the opposite of CDN economics where popularity means higher cost.
- Verify: SHA-256 checksum validation on the complete package, regardless of download method. BitTorrent’s built-in piece-level hashing provides additional integrity during transfer.
WebTorrent for browser builds (WASM): Standard BitTorrent uses TCP/UDP, which browsers can’t access. WebTorrent extends the BitTorrent protocol over WebRTC, enabling browser-to-browser P2P. The Workshop server includes a WebTorrent tracker endpoint. Desktop clients and browser clients can interoperate — desktop seeds serve browser peers and vice versa through hybrid WebSocket/WebRTC bridges. HTTP fallback is mandatory: if WebTorrent signaling fails (signaling server down, WebRTC blocked), the client must fall back to direct HTTP download without user intervention. Multiple signaling servers are maintained for redundancy. Signaling servers only facilitate WebRTC negotiation — they never see package content, so even a compromised signaling server cannot serve tampered data (SHA-256 verification catches that).
Tracker authentication & token rotation: P2P tracker access uses per-session tokens tied to client authentication (Workshop credentials or anonymous session token), not static URL secrets. Tokens rotate every release cycle. Even unauthorized peers joining a swarm cannot serve corrupt data (SHA-256 + piece hashing), but token rotation limits unauthorized swarm observation and bandwidth waste. See 06-SECURITY.md for the broader security model.
Transport strategy by package size:
| Package Size | Strategy | Rationale |
|---|---|---|
| < 5MB | HTTP direct only | P2P overhead exceeds benefit for small files. Maps, balance presets, palettes. |
| 5–50MB | P2P + HTTP concurrent (web seeding); HTTP-only fallback | Sprite packs, sound packs, script libraries. Web seeds supplement BT swarm. |
| > 50MB | P2P + HTTP concurrent (web seeding); P2P strongly preferred | HD resource packs, cutscene packs, full mods. HTTP seeds provide baseline bandwidth. |
Thresholds are configurable in settings.toml. Players on connections where BitTorrent is throttled or blocked can force HTTP-only mode.
D069 setup/maintenance wizard transport policy: The installation/setup wizard (D069) and its maintenance flows reuse the same transport stack with stricter UX-oriented defaults:
- Initial setup downloads use
user-requestedpriority (notbackground) and surface source indicators (P2P/HTTP) in progress UI. - Small setup assets/config packages (including
player-configprofiles, small language packs, and tiny metadata-driven fixes) should default to HTTP direct per the size strategy above to avoid P2P startup overhead. - Large optional media packs (cutscenes, HD assets) use BT + HTTP concurrent download (when web seed URLs exist in torrent metadata), with HTTP-only as last resort. The wizard must explain this transparently (“faster from peers when available”).
- Offline-first behavior: if no network is available, the setup wizard completes local-only steps and defers downloadable packs instead of failing the entire flow.
D069 repair/verify mapping: The maintenance wizard’s Repair & Verify actions map directly to D049 primitives:
- Verify installed packages → re-check
.icpkg/blob hashes against manifests and registry metadata - Repair package content → re-fetch missing/corrupt blobs/packages (HTTP or P2P based on size/policy)
- Rebuild indexes/metadata → rebuild local package/cache indexes from installed manifests + blob store
- Reclaim space → run GC over unreferenced blobs/package references (same CAS cleanup model)
Repair/verify is an IC-side content/setup operation. Store-platform binary verification (Steam/GOG) remains a separate platform responsibility and is only linked/guided from the wizard.
Auto-download on lobby join (D030 interaction): When joining a lobby with missing resources, the client downloads via BT + HTTP concurrently (lobby peers are high-priority BT sources since they already have the content). If web seeds exist, HTTP mirrors contribute bandwidth immediately alongside lobby peers. If no BT peers or web seeds are available, the client uses HTTP direct download as a last resort. The lobby UI shows download progress with source indicators (P2P/HTTP). See D052 § “In-Lobby P2P Resource Sharing” for the detailed lobby protocol, including host-as-tracker, verification against Workshop index, and security constraints.
Gaming industry precedent:
- Blizzard (WoW, StarCraft 2, Diablo 3): Used a custom P2P downloader (“Blizzard Downloader”, later integrated into Battle.net) for game patches and updates from 2004–2016. Saved millions in CDN costs for multi-GB patches distributed to millions of players.
- Wargaming (World of Tanks): Used P2P distribution for game updates.
- Linux distributions: Ubuntu, Fedora, Arch all offer torrent downloads for ISOs — the standard solution for distributing large files from community infrastructure.
- Steam Workshop: Steam subsidizes centralized hosting from game sales revenue. We don’t have that luxury — P2P is the community-sustainable alternative.
Competitive landscape — game mod platforms:
IC’s Workshop exists in a space with several established modding platforms. None offer the combination of P2P distribution, federation, self-hosting, and in-engine integration that IC targets.
| Platform | Model | Scale | In-game integration | P2P | Federation / Self-host | Dependencies | Open source |
|---|---|---|---|---|---|---|---|
| Nexus Mods | Centralized web portal + Vortex mod manager. CDN distribution, throttled for free users. Revenue: premium membership + ads. | 70.7M users, 4,297 games, 21B downloads. Largest modding platform. | None — external app (Vortex). | ❌ | ❌ | ❌ | Vortex client (GPL-3.0). Backend proprietary. |
| mod.io | UGC middleware — embeddable SDKs (Unreal/Unity/C++), REST API, white-label UI. Revenue: B2B SaaS (free tier + enterprise). | 2.5B downloads, 38M MAU, 332 live games. Backed by Tencent ($26M Series A). | Yes — SDK provides in-game browsing, download, moderation. Console-certified (PS/Xbox/Switch). | ❌ | ❌ | partial | SDKs open (MIT/Apache). Backend/service proprietary. |
| Modrinth | Open-source mod registry. Centralized CDN. Revenue: ads + donations. | ~100K projects, millions of monthly downloads. Growing fast. | Through third-party launchers (Prism, etc). | ❌ | ❌ | ✅ | Server (AGPL), API open. |
| CurseForge (Overwolf) | Centralized mod registry + CurseForge app. Revenue: Overwolf overlay ads. | Dominant for Minecraft, WoW, other Blizzard games. | CurseForge app, some launcher integrations. | ❌ | ❌ | ✅ | ❌ |
| Thunderstore | Open-source mod registry. Centralized CDN. | Popular for Risk of Rain 2, Lethal Company, Valheim. | Through r2modman manager. | ❌ | ❌ | ✅ | Server (AGPL-3.0). |
| Steam Workshop | Integrated into Steam. Free hosting (subsidized by game sales revenue). | Thousands of games, billions of downloads. | Deep Steam integration. | ❌ | ❌ | ❌ | ❌ |
| ModDB / GameBanana | Web portals — manual upload/download, community features, editorial content. Legacy platforms (2001–2002). | ModDB: 12.5K+ mods, 108M+ downloads. GameBanana: strong in Source Engine games. | None. | ❌ | ❌ | ❌ | ❌ |
Competitive landscape — P2P + Registry infrastructure:
The game mod platforms above are all centralized. A separate set of projects tackle P2P distribution at the infrastructure level, but none target game modding specifically. See research/p2p-federated-registry-analysis.md for a comprehensive standalone analysis of this space and its applicability beyond IC.
| Project | Architecture | Domain | How it relates to IC Workshop |
|---|---|---|---|
| Uber Kraken (6.6k★) | P2P Docker registry — custom BitTorrent-like protocol, Agent/Origin/Tracker/Build-Index. Pluggable storage (S3/GCS/HDFS). | Container images (datacenter) | Closest architectural match. Kraken’s Agent/Origin/Tracker/Build-Index maps to IC’s Peer/Seed-box/Tracker/Workshop-Index. IC’s P2P protocol design (peer selection policy, piece request strategy, connection state machine, announce cycle, bandwidth limiting) is directly informed by Kraken’s production experience — see protocol details above and research/p2p-federated-registry-analysis.md § “Uber Kraken — Deep Dive” for the full analysis. Key difference: Kraken is intra-datacenter (3s announce, 10Gbps links), IC is internet-scale (30s announce, residential connections). |
| Dragonfly (3k★, CNCF Graduated) | P2P content distribution — Manager/Scheduler/Seed-Peer/Peer. Centralized evaluator-based scheduling with 4-dimensional peer scoring (LoadQuality×0.6 + IDCAffinity×0.2 + LocationAffinity×0.1 + HostType×0.1). DAG-based peer graph, back-to-source fallback. Persistent cache with replica management. Client rewritten in Rust (v2). Trail of Bits audited (2023). | Container images, AI models, artifacts | Same P2P-with-fallback pattern. Dragonfly’s hierarchical location affinity (country|province|city|zone), statistical bad-peer detection (three-sigma rule), capacity-aware scoring, persistent replica count, and download priority tiers are all patterns IC adapts. Key differences: Dragonfly uses centralized scheduling (IC uses BitTorrent swarm — simpler, more resilient to churn), Dragonfly is single-cluster with no cross-cluster P2P (IC is federated), Dragonfly requires K8s+Redis+MySQL (IC requires only SQLite). Dragonfly’s own RFC #3713 acknowledges piece-level selection is FCFS — BitTorrent’s rarest-first is already better. See research/p2p-federated-registry-analysis.md § “Dragonfly — CNCF P2P Distribution (Deep Dive)” for full analysis. |
| JFrog Artifactory P2P (proprietary) | Enterprise P2P distribution — mesh of nodes sharing cached binary artifacts within corporate networks. | Enterprise build artifacts | The direct inspiration for IC’s repository model. JFrog added P2P because CDN costs for large binaries at scale are unsustainable — same motivation as IC. |
| Blizzard NGDP/Agent (proprietary) | Custom P2P game patching — BitTorrent-based, CDN+P2P hybrid, integrated into Battle.net launcher. | Game patches (WoW, SC2, Diablo) | Closest gaming precedent. Proved P2P game content distribution works at massive scale. Proprietary, not a registry (no search/ratings/deps), not federated. |
| Homebrew / crates.io-index | Git-backed package indexes. CDN for actual downloads. | Software packages | IC’s Phase 0–3 git-index is directly inspired by these. No P2P distribution. |
| IPFS | Content-addressed P2P storage — any content gets a CID, any node can pin and serve it. DHT-based discovery. Bitswap protocol for block exchange with Decision Engine and Score Ledger. | General-purpose decentralized storage | Rejected as primary distribution protocol (too general, slow cold-content discovery, complex setup, poor game-quality UX). However, IPFS’s Bitswap protocol contributes significant patterns IC adopts: EWMA peer scoring with time-decaying reputation (Score Ledger), per-peer fairness caps (MaxOutstandingBytesPerPeer), want-have/want-block two-phase discovery, broadcast control (target proven-useful peers), dual WAN/LAN discovery (validates IC’s LAN party mode), delegated HTTP routing (validates IC’s registry-as-router), server/client mode separation, and batch provider announcements (Sweep Provider). IPFS’s 9-year-unresolved bandwidth limiting issue (#3065, 73 👍) proves bandwidth caps must ship day one. See research/p2p-federated-registry-analysis.md § “IPFS — Content-Addressed P2P Storage (Deep Dive)” for full analysis. |
| Microsoft Delivery Optimization | Windows Update P2P — peers on the same network share update packages. | OS updates | Proves P2P works for verified package distribution at billions-of-devices scale. Proprietary, no registry model. |
What’s novel about IC’s combination: No existing system — modding platform or infrastructure — combines (1) federated registry with repository types, (2) P2P distribution via BitTorrent/WebTorrent, (3) zero-infrastructure git-hosted bootstrap, (4) browser-compatible P2P via WebTorrent, (5) in-engine integration with lobby auto-download, and (6) fully open-source with self-hosting as a first-class use case. The closest architectural comparison is mod.io (embeddable SDK approach, in-game integration) but mod.io is a proprietary centralized SaaS — no P2P, no federation, no self-hosting. The closest distribution comparison is Uber Kraken (P2P registry) but it has no modding features. Each piece has strong precedent; the combination is new. The Workshop architecture is game-agnostic and could serve as a standalone platform — see the research analysis for exploration of this possibility.
Seeding infrastructure:
The Workshop doesn’t rely solely on player altruism for seeding:
- Workshop seed server: A dedicated seed box (modest: a VPS with good upload bandwidth) that permanently seeds all Workshop content. This ensures new/unpopular packages are always downloadable even with zero player peers. Cost: ~$20-50/month for a VPS with 1TB+ storage and unmetered bandwidth.
- Community seed volunteers: Players who opt in to extended seeding (beyond just while the game is running). Similar to how Linux mirror operators volunteer bandwidth. Could be incentivized with Workshop badges/reputation (D036/D037).
- Mirror servers (federation): Community-hosted Workshop servers (D030 federation) also seed the content they host. Regional community servers naturally become regional seeds.
- Lobby-optimized seeding: When a lobby host has required mods, the game client prioritizes seeding to joining players who are downloading. The “auto-download on lobby join” flow aggregates bandwidth from lobby peers (highest priority) + wider swarm + HTTP web seeds concurrently, with HTTP-only as last resort.
Privacy and security:
- IP visibility: Standard BitTorrent exposes peer IP addresses. This is the same exposure as any multiplayer game (players already see each other’s IPs or relay IPs). For privacy-sensitive users, HTTP-only mode avoids P2P IP exposure.
- Content integrity: SHA-256 verification on complete packages catches any tampering. BitTorrent’s piece-level hashing catches corruption during transfer. Double-verified.
- No metadata leakage: The tracker only knows which peers have which packages (by info hash). It doesn’t inspect content. Package contents are just game assets — sprites, audio, maps.
- ISP throttling mitigation: BitTorrent traffic can be throttled by ISPs. Mitigations: protocol encryption (standard in modern BT clients), WebSocket transport (looks like web traffic), and HTTP fallback as ultimate escape. Settings allow forcing HTTP-only mode.
- Resource exhaustion: Rate-limited seeding (configurable upload cap in settings). Players control how much bandwidth they donate. Default: 1MB/s upload, adjustable to 0 (leech-only, no seeding — discouraged but available).
P2P protocol design details:
The Workshop’s P2P engine is informed by production experience from Uber Kraken (Apache 2.0, 6.6k★) and Dragonfly (Apache 2.0, CNCF Graduated). Kraken distributes 1M+ container images/day across 15K+ hosts using a custom BitTorrent-inspired protocol; Dragonfly uses centralized evaluator-based scheduling at Alibaba scale. IC adapts Kraken’s connection management and Dragonfly’s scoring insights for internet-scale game mod distribution. See research/p2p-federated-registry-analysis.md for full architectural analyses of both systems.
Cross-pollination with IC netcode and community infrastructure. The Workshop P2P engine and IC’s netcode infrastructure (relay server, tracking server —
03-NETCODE.md) share deep structural parallels: federation, heartbeat/TTL, rate control, connection state machines, observability, deployment model. Patterns flow both directions — netcode’s three-layer rate control and token-based liveness improve Workshop; Workshop’s EWMA scoring and multi-dimensional peer evaluation improve relay server quality tracking. A full cross-pollination analysis (including shared infrastructure opportunities: unified server binary, federation library, auth/identity layer) is inresearch/p2p-federated-registry-analysis.md§ “Netcode ↔ Workshop Cross-Pollination.” Additional cross-pollination with D052/D053 (community servers, player profiles, trust-based filtering) is catalogued in D052 § “Cross-Pollination” — highlights include: two-key architecture for index signing and publisher identity, trust-based source filtering, server-side validation as a shared invariant, and trust-verified peer selection scoring.
Peer selection policy (tracker-side): The tracker returns a sorted peer list on each announce response. The sorting policy is pluggable — inspired by Kraken’s assignmentPolicy interface pattern. IC’s default policy prioritizes:
- Seeders (completed packages — highest priority, like Kraken’s
completenesspolicy) - Lobby peers (peers in the same multiplayer lobby — guaranteed to have the content, lowest latency)
- Geographically close peers (same region/ASN — reduces cross-continent transfers)
- High-completion peers (more pieces available — better utilization of each connection)
- Random (fallback for ties — prevents herding)
Peer handout limit: 30 peers per announce response (Kraken uses 50, but IC has fewer total peers per package). Community-hosted trackers can implement custom policies via the server config.
Planned evolution — weighted multi-dimensional scoring (Phase 5+): Dragonfly’s evaluator demonstrates that combining capacity, locality, and node type into a weighted score produces better peer selection than linear priority tiers. IC’s Phase 5+ peer selection evolves to a weighted scoring model informed by Dragonfly’s approach:
PeerScore = Capacity(0.4) + Locality(0.3) + SeedStatus(0.2) + LobbyContext(0.1)
- Capacity (weight 0.4): Spare bandwidth reported in announce (
1 - upload_bw_used / upload_bw_max). Peers with more headroom score higher. Inspired by Dragonfly’sLoadQualitymetric (which sub-decomposes into peak bandwidth, sustained load, and concurrency). IC uses a single utilization ratio — simpler, captures the same core insight. - Locality (weight 0.3): Hierarchical location matching. Clients self-report location as
continent|country|region|city(4-level, pipe-delimited — adapted from Dragonfly’s 5-levelcountry|province|city|zone|cluster). Score =matched_prefix_elements / 4. Two peers in the same city score 0.75; same country but different region: 0.5; same continent: 0.25. - SeedStatus (weight 0.2): Seed box = 1.0, completed seeder = 0.7, uploading leecher = 0.3. Inspired by Dragonfly’s
HostTypescore (seed peers = 1.0, normal = 0.5). - LobbyContext (weight 0.1): Same lobby = 1.0, same game session = 0.5, no context = 0. IC-specific — Dragonfly has no equivalent (no lobby concept).
The initial 5-tier priority system (above) ships first and is adequate for community scale. Weighted scoring is additive — the same pluggable policy interface supports both approaches. Community servers can configure their own weights or contribute custom scoring policies.
Bucket-based scheduling (Phase 5+): The weighted scoring formula above is applied within pre-sorted bucket leaves, not flat across all peers. The embedded tracker organizes its peer table into a
PeerBucketTreeindexed byRegionKey × SeedStatus × TransportType. On each announce response, the tracker walks buckets from closest-matching outward, applying weighted scoring within each leaf to produce the final peer list. This reduces per-announce work from O(n) to O(k) where k is the bucket size, and naturally produces locality-optimized peer sets without requiring the scoring formula to carry the full locality signal. Connection pool bucketing (per-transport guaranteed slot minimums) prevents cross-transport starvation when TCP, uTP, and WebRTC peers coexist. Content popularity bucketing (Hot/Warm/Cold/Frozen tiers via EWMA) steers seed box bandwidth toward under-served swarms. Full design:research/p2p-distribute-crate-design.md§ 2.8.
Piece request strategy (client-side): The engine uses rarest-first piece selection by default — a priority queue sorted by fewest peers having each piece. This is standard BitTorrent behavior, well-validated for internet conditions. Kraken also implements this as rarestFirstPolicy.
- Pipeline limit: 3 concurrent piece requests per peer (matches Kraken’s default). Prevents overwhelming slow peers.
- Piece request timeout: 8s base + 6s per MB of piece size (more generous than Kraken’s 4s+4s/MB, compensating for residential internet variance).
- Endgame mode: When remaining pieces ≤ 5, the engine sends duplicate piece requests to multiple peers. This prevents the “last piece stall” — a well-known BitTorrent problem where the final piece’s sole holder is slow. Kraken implements this as
EndgameThreshold— it’s essential.
Connection state machine (client-side):
pending ──connect──► active ──timeout/error──► blacklisted
▲ │ │
│ │ │
└──────────── cooldown (5min) ◄─────────────────┘
MaxConnectionsPerPackage: 8(lower than Kraken’s 10 — residential connections have less bandwidth to share)- Blacklisting: peers that produce zero useful throughput over 30 seconds are temporarily blacklisted (5-minute cooldown). Catches both dead peers and ISP-throttled connections.
- Sybil resistance: Maximum 3 peers per /24 subnet in a single swarm. Prefer peers from diverse autonomous systems (ASNs) when possible. Sybil attacks can waste bandwidth but cannot serve corrupt data (SHA-256 integrity), so the risk ceiling is low.
- Statistical degradation detection (Phase 5+): Inspired by Dragonfly’s
IsBadParentalgorithm — track per-peer piece transfer times. Peers whose last transfer exceedsmax(3 × mean, 2 × p95)of observed transfer times are demoted in scoring (not hard-blacklisted — they may recover). For sparse data (< 50 samples per peer), fall back to the simpler “20× mean” ratio check. Hard blacklist remains only for zero-throughput (complete failure). This catches degrading peers before they fail completely. - Connections have TTL — idle connections are closed after 60 seconds to free resources.
Announce cycle (client → tracker): Clients announce to the tracker every 30 seconds (Kraken uses 3s for datacenter — far too aggressive for internet). The tracker can dynamically adjust: faster intervals (10s) during active downloads, slower (60s) when seeding idle content. Max interval cap (120s) prevents unbounded growth. Announce payload includes: PeerID, package info hash, bitfield (what pieces the client has), upload/download speed.
Size-based piece length: Different package sizes use different piece lengths to balance metadata overhead against download granularity (inspired by Kraken’s PieceLengths config):
| Package Size | Piece Length | Rationale |