Engagement Proposal — Confidential
KoalaSense AI · Engagement Proposal

Universal AI Game Master
— Foundry VTT Pivot

A multi-modal AI Dungeon Master running locally on a table Mini PC, combining voice, vision, RAG-powered rules, and AR map projection through Foundry VTT. Architecture pivot from Discord is sound. RAG backend is stable. The blocker is implementation — the Foundry module does not exist yet.

Agentic Readiness Implementation Support Law 25 Gap RAG Stable
KoalaSense constellation
KoalaSense orchestrates specialized AI agents — like a constellation guiding your operations.
Project context
What this system is

A single Gemini Live agent acting as Dungeon Master — locally deployed, voice-first, table-aware.

Target Architecture
LayerTechnologyPurposeStatus
AI DMGemini Live (Vertex AI)Single agent — voice + visionNot started
Voice/VideoLiveKit (WebRTC)Player audio streamsNot started
PlatformFoundry VTT Module (JS)Campaign mgmt, speaker ID, AR displayNot started
RulesQdrant + Python RAGPDF rulebook retrieval on-demandStable ✓
AR DisplayProjector + table cameraPhysical miniature detection, map projectionScoping needed
IdentityFoundry user → characterSpeaker identificationNot started
CTO Advisor · Technical Assessment
Technical Findings

The core technical bet is Gemini Live as a single agent handling voice + vision simultaneously. Stream A (JS client) is the hard critical path blocker.

● Critical
Stream A is a hard blocker for all other streams
gemini-live-client.js does not exist yet. Streams B, C, and E cannot start until it does.
● High
6 "parallel" streams have one true critical path
Only Stream D and A can start independently. Real sequence: A → B+E → C → F.
● High
Concurrent voice + vision in one agent is unvalidated in JS
No test showing Gemini Live handles simultaneous audio stream + camera frames. Spike required before M6.
● Medium
AR projection scope undefined
Passive camera feed vs. active Foundry scene token manipulation are fundamentally different efforts.
● Medium
Python → JS port has no discipline document
No strategy for preserving known-good Python behaviour. Silent regressions are high risk.
● Low
RAG backend is stable — protect it
193/193 tests passing. Do not touch this during the JS pivot.
CISO Advisor · Security & Privacy
Security Findings

The primary risk is the demo-readiness gap: this system captures voice and video of real people with no consent mechanism.

STRIDE Threat Assessment (abbreviated)
ThreatVectorFindingSeverity
Info DisclosureCameraCaptures players' physical environment including faces — no consentHigh
Privacy ViolationGemini APIVoice + video sent to Google; no data processing agreementHigh
Info DisclosureModule settingsGemini API key stored in Foundry — unencrypted at restMedium
SpoofingLocal networkFoundry VTT port exposed on LAN without auth layerMedium
Law 25 (Québec) — Pre-Demo Checklist
RequirementStatusGap
Consent for voice recording❌ MissingRequired before any session with real participants
Consent for camera/image capture❌ MissingPhysical environment + faces = personal information
Data retention policy❌ MissingHow long are transcripts and session logs kept?
Gemini as data processor⚠ PartialData processing agreement with Google required
GCP region (northamerica-northeast1)✓ Confirmed
COO Advisor · Operational Assessment
Operational Findings

The primary operational risk is founder dependency — every decision lives in one person's head, and the system cannot be demoed or operated without them.

● High
No runbook — system cannot be operated by anyone else
Blocks client demos and any delegation. Setup must be documented to a 10-step checklist.
● High
AI coding agent streams assume parallelism the dependency graph does not support
Without explicit handoff protocol, agents will block on each other and waste cycles.
● Medium
Python → JS port discipline undocumented
High-value methodology pattern for client migrations. Should be captured in Agentic-Integration playbook.
● Medium
Failure mode taxonomy not extracted
Quota exhaustion, audio drop, spawn failure patterns belong in the integration platform as a pre-mortem checklist.
Engagement Proposal · 6-week plan
Recommended Workplan

Sequenced by actual dependency chain, not the 6-stream framing. Gate decisions are explicit. Consent is non-negotiable before any demo.

Weeks 1–2 · Stream A
Gemini Live JS Client + Smoke Test
Build gemini-live-client.js with a minimal WebSocket connection to Gemini Live. Confirm audio round-trip works in a browser context. This is the gate for everything else — go/no-go decision point before any further stream work commits.
Working JS client with documented smoke test. Go/no-go gate.
Weeks 2–3 · Stream B + Law 25
Audio Bridge + Consent Gate
LiveKit ↔ Gemini audio routing. Simultaneously implement a pre-session consent screen — a single modal before camera and audio start. This satisfies the Law 25 minimum for any demo involving real participants. Not optional.
Audio pipeline functional. Consent mechanism live.
Weeks 3–4 · Stream D + C
RAG API Endpoint + Session Logging
Connect the stable Python RAG backend to the Foundry module via local API. Add transcript logging to Foundry Journal. These are independent of vision — unblocks the core DM experience without waiting for camera work.
AI DM answers rules questions in session. Sessions are logged to Journal.
Weeks 4–5 · Stream E
Vision Scope Decision + Multi-modal Spike
Explicitly scope vision to passive camera capture fed as context (v1) vs active token manipulation (defer). Run the multi-modal spike — audio + camera frame simultaneously — before committing to M6 scope. This spike determines whether the vision timeline is 1 week or 4.
Vision scope documented. Spike result determines M6 plan.
Weeks 5–6
Demo Runbook + First Full Session
10-step setup runbook. Full end-to-end demo session with all integrated components. Recording captured as a sales artifact. Remaining gaps documented as v2 scope.
Runbook complete. Full demo session recorded and usable as a sales asset.
Investment
Engagement Cost

Pilot engagement at founder rate. Standard SME equivalent scope is $4,800–$6,000 CAD.

ItemDetailEst. Cost
Agentic Readiness AssessmentCompleted — multi-advisor review (CTO, CISO, COO)$0 — Pilot
Implementation Advisory6 weeks — weekly advisory + async support$2,400 CAD
Agent-Assisted Dev SupportClaude Code API hours on Streams A–D~$300 CAD
TotalInternal pilot rate~$2,700 CAD
Expected Benefits
Before vs. After
Without KoalaSense
With KoalaSense
Time to first demo
Unknown — no validated JS path, timeline undefined
Time to first demo
2 weeks to smoke test, 6 weeks to full working demo
Demo repeatability
Founder-only, no runbook, cannot delegate
Demo repeatability
10-minute setup checklist, operable by anyone
Law 25 compliance
Blocked — no consent mechanism, not demo-safe
Law 25 compliance
Minimum viable consent gate in place for live demos
Sales readiness
Not demonstrable — module does not exist
Sales readiness
Live demo usable as flagship sales artifact
Methodology capture
Learnings stay in DnD-DM, never extracted
Methodology capture
Port discipline + failure taxonomy → Agentic-Integration playbook
Open Questions
Decisions needed before implementation
  • What is the exact vision scope — passive camera feed to Gemini context, or active Foundry scene token manipulation from camera detection?
  • Is a multi-modal spike (concurrent audio + video) planned before M6 scope is committed?
  • What is the intended data retention period for session transcripts and Foundry Journal entries?
  • Is Gemini Vertex AI data processing confirmed to stay in northamerica-northeast1 for Law 25?
Explicitly out of scope (v2)
  • Voice biometric speaker ID
  • Active AR token manipulation from camera
  • Multi-game system support beyond D&D 5e
  • Cloud deployment
  • Multi-player remote sessions
Platform Notes
KoalaSense Platform Gaps Identified

This engagement also served as a validation run for the KoalaSense platform itself (issue #40). Two skill gaps were surfaced.

Gap 1 — No Law 25 content in CISO skill

The ciso-advisor SKILL.md covers GDPR, CCPA, and HIPAA but has no Québec-specific Law 25 section. For a platform targeting Québec SMEs, this is a material gap. A PR is needed to add a Law 25 section with consent mechanism, data residency, retention policy, and 72-hour breach notification requirements.

Gap 2 — No engagement proposal synthesis skill

The Dify assessment DAG (issue #38) has a "Synthesis node" listed as producing a structured proposal document — but no engagement-proposal-writer skill backs it. The DAG has a phantom node. This proposal page is its reference implementation.