KoalaSense AI · Engagement Proposal
Universal AI Game Master
— Foundry VTT Pivot
A multi-modal AI Dungeon Master running locally on a table Mini PC, combining voice, vision, RAG-powered rules, and AR map projection through Foundry VTT. Architecture pivot from Discord is sound. RAG backend is stable. The blocker is implementation — the Foundry module does not exist yet.
Agentic Readiness
Implementation Support
Law 25 Gap
RAG Stable
KoalaSense orchestrates specialized AI agents — like a constellation guiding your operations.
Project context
What this system is
A single Gemini Live agent acting as Dungeon Master — locally deployed, voice-first, table-aware.
Target Architecture
| Layer | Technology | Purpose | Status |
| AI DM | Gemini Live (Vertex AI) | Single agent — voice + vision | Not started |
| Voice/Video | LiveKit (WebRTC) | Player audio streams | Not started |
| Platform | Foundry VTT Module (JS) | Campaign mgmt, speaker ID, AR display | Not started |
| Rules | Qdrant + Python RAG | PDF rulebook retrieval on-demand | Stable ✓ |
| AR Display | Projector + table camera | Physical miniature detection, map projection | Scoping needed |
| Identity | Foundry user → character | Speaker identification | Not started |
CTO Advisor · Technical Assessment
Technical Findings
The core technical bet is Gemini Live as a single agent handling voice + vision simultaneously. Stream A (JS client) is the hard critical path blocker.
● CriticalStream A is a hard blocker for all other streams
gemini-live-client.js does not exist yet. Streams B, C, and E cannot start until it does.
● High6 "parallel" streams have one true critical path
Only Stream D and A can start independently. Real sequence: A → B+E → C → F.
● HighConcurrent voice + vision in one agent is unvalidated in JS
No test showing Gemini Live handles simultaneous audio stream + camera frames. Spike required before M6.
● MediumAR projection scope undefined
Passive camera feed vs. active Foundry scene token manipulation are fundamentally different efforts.
● MediumPython → JS port has no discipline document
No strategy for preserving known-good Python behaviour. Silent regressions are high risk.
● LowRAG backend is stable — protect it
193/193 tests passing. Do not touch this during the JS pivot.
CISO Advisor · Security & Privacy
Security Findings
The primary risk is the demo-readiness gap: this system captures voice and video of real people with no consent mechanism.
STRIDE Threat Assessment (abbreviated)
| Threat | Vector | Finding | Severity |
| Info Disclosure | Camera | Captures players' physical environment including faces — no consent | High |
| Privacy Violation | Gemini API | Voice + video sent to Google; no data processing agreement | High |
| Info Disclosure | Module settings | Gemini API key stored in Foundry — unencrypted at rest | Medium |
| Spoofing | Local network | Foundry VTT port exposed on LAN without auth layer | Medium |
Law 25 (Québec) — Pre-Demo Checklist
| Requirement | Status | Gap |
| Consent for voice recording | ❌ Missing | Required before any session with real participants |
| Consent for camera/image capture | ❌ Missing | Physical environment + faces = personal information |
| Data retention policy | ❌ Missing | How long are transcripts and session logs kept? |
| Gemini as data processor | ⚠ Partial | Data processing agreement with Google required |
| GCP region (northamerica-northeast1) | ✓ Confirmed | — |
COO Advisor · Operational Assessment
Operational Findings
The primary operational risk is founder dependency — every decision lives in one person's head, and the system cannot be demoed or operated without them.
● HighNo runbook — system cannot be operated by anyone else
Blocks client demos and any delegation. Setup must be documented to a 10-step checklist.
● HighAI coding agent streams assume parallelism the dependency graph does not support
Without explicit handoff protocol, agents will block on each other and waste cycles.
● MediumPython → JS port discipline undocumented
High-value methodology pattern for client migrations. Should be captured in Agentic-Integration playbook.
● MediumFailure mode taxonomy not extracted
Quota exhaustion, audio drop, spawn failure patterns belong in the integration platform as a pre-mortem checklist.
Engagement Proposal · 6-week plan
Recommended Workplan
Sequenced by actual dependency chain, not the 6-stream framing. Gate decisions are explicit. Consent is non-negotiable before any demo.
Weeks 1–2 · Stream A
Gemini Live JS Client + Smoke Test
Build gemini-live-client.js with a minimal WebSocket connection to Gemini Live. Confirm audio round-trip works in a browser context. This is the gate for everything else — go/no-go decision point before any further stream work commits.
Working JS client with documented smoke test. Go/no-go gate.
Weeks 2–3 · Stream B + Law 25
Audio Bridge + Consent Gate
LiveKit ↔ Gemini audio routing. Simultaneously implement a pre-session consent screen — a single modal before camera and audio start. This satisfies the Law 25 minimum for any demo involving real participants. Not optional.
Audio pipeline functional. Consent mechanism live.
Weeks 3–4 · Stream D + C
RAG API Endpoint + Session Logging
Connect the stable Python RAG backend to the Foundry module via local API. Add transcript logging to Foundry Journal. These are independent of vision — unblocks the core DM experience without waiting for camera work.
AI DM answers rules questions in session. Sessions are logged to Journal.
Weeks 4–5 · Stream E
Vision Scope Decision + Multi-modal Spike
Explicitly scope vision to passive camera capture fed as context (v1) vs active token manipulation (defer). Run the multi-modal spike — audio + camera frame simultaneously — before committing to M6 scope. This spike determines whether the vision timeline is 1 week or 4.
Vision scope documented. Spike result determines M6 plan.
Weeks 5–6
Demo Runbook + First Full Session
10-step setup runbook. Full end-to-end demo session with all integrated components. Recording captured as a sales artifact. Remaining gaps documented as v2 scope.
Runbook complete. Full demo session recorded and usable as a sales asset.
Investment
Engagement Cost
Pilot engagement at founder rate. Standard SME equivalent scope is $4,800–$6,000 CAD.
| Item | Detail | Est. Cost |
| Agentic Readiness Assessment | Completed — multi-advisor review (CTO, CISO, COO) | $0 — Pilot |
| Implementation Advisory | 6 weeks — weekly advisory + async support | $2,400 CAD |
| Agent-Assisted Dev Support | Claude Code API hours on Streams A–D | ~$300 CAD |
| Total | Internal pilot rate | ~$2,700 CAD |
Expected Benefits
Before vs. After
Without KoalaSense
With KoalaSense
Time to first demo
Unknown — no validated JS path, timeline undefined
Time to first demo
2 weeks to smoke test, 6 weeks to full working demo
Demo repeatability
Founder-only, no runbook, cannot delegate
Demo repeatability
10-minute setup checklist, operable by anyone
Law 25 compliance
Blocked — no consent mechanism, not demo-safe
Law 25 compliance
Minimum viable consent gate in place for live demos
Sales readiness
Not demonstrable — module does not exist
Sales readiness
Live demo usable as flagship sales artifact
Methodology capture
Learnings stay in DnD-DM, never extracted
Methodology capture
Port discipline + failure taxonomy → Agentic-Integration playbook
Open Questions
Decisions needed before implementation
- What is the exact vision scope — passive camera feed to Gemini context, or active Foundry scene token manipulation from camera detection?
- Is a multi-modal spike (concurrent audio + video) planned before M6 scope is committed?
- What is the intended data retention period for session transcripts and Foundry Journal entries?
- Is Gemini Vertex AI data processing confirmed to stay in northamerica-northeast1 for Law 25?
Explicitly out of scope (v2)
- Voice biometric speaker ID
- Active AR token manipulation from camera
- Multi-game system support beyond D&D 5e
- Cloud deployment
- Multi-player remote sessions
Platform Notes
KoalaSense Platform Gaps Identified
This engagement also served as a validation run for the KoalaSense platform itself (issue #40). Two skill gaps were surfaced.
Gap 1 — No Law 25 content in CISO skill
The ciso-advisor SKILL.md covers GDPR, CCPA, and HIPAA but has no Québec-specific Law 25 section. For a platform targeting Québec SMEs, this is a material gap. A PR is needed to add a Law 25 section with consent mechanism, data residency, retention policy, and 72-hour breach notification requirements.
Gap 2 — No engagement proposal synthesis skill
The Dify assessment DAG (issue #38) has a "Synthesis node" listed as producing a structured proposal document — but no engagement-proposal-writer skill backs it. The DAG has a phantom node. This proposal page is its reference implementation.