KoalaSense AI · Engagement Proposal

Universal AI Game Master
— Foundry VTT Pivot

A multi-modal AI Dungeon Master running locally on a table Mini PC, combining voice, vision, RAG-powered rules, and AR map projection through Foundry VTT. Architecture pivot from Discord is sound. RAG backend is stable. The blocker is implementation — the Foundry module does not exist yet.

Agentic Readiness Implementation Support Law 25 Gap RAG Stable

KoalaSense orchestrates specialized AI agents — like a constellation guiding your operations.

Project context

What this system is

A single Gemini Live agent acting as Dungeon Master — locally deployed, voice-first, table-aware.

Target Architecture

Layer	Technology	Purpose	Status
AI DM	Gemini Live (Vertex AI)	Single agent — voice + vision	Not started
Voice/Video	LiveKit (WebRTC)	Player audio streams	Not started
Platform	Foundry VTT Module (JS)	Campaign mgmt, speaker ID, AR display	Not started
Rules	Qdrant + Python RAG	PDF rulebook retrieval on-demand	Stable ✓
AR Display	Projector + table camera	Physical miniature detection, map projection	Scoping needed
Identity	Foundry user → character	Speaker identification	Not started

CTO Advisor · Technical Assessment

Technical Findings

The core technical bet is Gemini Live as a single agent handling voice + vision simultaneously. Stream A (JS client) is the hard critical path blocker.

● Critical

Stream A is a hard blocker for all other streams

gemini-live-client.js does not exist yet. Streams B, C, and E cannot start until it does.

● High

6 "parallel" streams have one true critical path

Only Stream D and A can start independently. Real sequence: A → B+E → C → F.

● High

Concurrent voice + vision in one agent is unvalidated in JS

No test showing Gemini Live handles simultaneous audio stream + camera frames. Spike required before M6.

● Medium

AR projection scope undefined

Passive camera feed vs. active Foundry scene token manipulation are fundamentally different efforts.

● Medium

Python → JS port has no discipline document

No strategy for preserving known-good Python behaviour. Silent regressions are high risk.

● Low

RAG backend is stable — protect it

193/193 tests passing. Do not touch this during the JS pivot.

CISO Advisor · Security & Privacy

Security Findings

The primary risk is the demo-readiness gap: this system captures voice and video of real people with no consent mechanism.

STRIDE Threat Assessment (abbreviated)

Threat	Vector	Finding	Severity
Info Disclosure	Camera	Captures players' physical environment including faces — no consent	High
Privacy Violation	Gemini API	Voice + video sent to Google; no data processing agreement	High
Info Disclosure	Module settings	Gemini API key stored in Foundry — unencrypted at rest	Medium
Spoofing	Local network	Foundry VTT port exposed on LAN without auth layer	Medium

Law 25 (Québec) — Pre-Demo Checklist

Requirement	Status	Gap
Consent for voice recording	❌ Missing	Required before any session with real participants
Consent for camera/image capture	❌ Missing	Physical environment + faces = personal information
Data retention policy	❌ Missing	How long are transcripts and session logs kept?
Gemini as data processor	⚠ Partial	Data processing agreement with Google required
GCP region (northamerica-northeast1)	✓ Confirmed	—

COO Advisor · Operational Assessment

Operational Findings

The primary operational risk is founder dependency — every decision lives in one person's head, and the system cannot be demoed or operated without them.

● High

No runbook — system cannot be operated by anyone else

Blocks client demos and any delegation. Setup must be documented to a 10-step checklist.

● High

AI coding agent streams assume parallelism the dependency graph does not support

Without explicit handoff protocol, agents will block on each other and waste cycles.

● Medium

Python → JS port discipline undocumented

High-value methodology pattern for client migrations. Should be captured in Agentic-Integration playbook.

● Medium

Failure mode taxonomy not extracted

Quota exhaustion, audio drop, spawn failure patterns belong in the integration platform as a pre-mortem checklist.

Engagement Proposal · 6-week plan

Recommended Workplan

Sequenced by actual dependency chain, not the 6-stream framing. Gate decisions are explicit. Consent is non-negotiable before any demo.

Weeks 1–2 · Stream A

Gemini Live JS Client + Smoke Test

Build gemini-live-client.js with a minimal WebSocket connection to Gemini Live. Confirm audio round-trip works in a browser context. This is the gate for everything else — go/no-go decision point before any further stream work commits.

Working JS client with documented smoke test. Go/no-go gate.

Weeks 2–3 · Stream B + Law 25

Audio Bridge + Consent Gate

LiveKit ↔ Gemini audio routing. Simultaneously implement a pre-session consent screen — a single modal before camera and audio start. This satisfies the Law 25 minimum for any demo involving real participants. Not optional.

Audio pipeline functional. Consent mechanism live.

Weeks 3–4 · Stream D + C

RAG API Endpoint + Session Logging

Connect the stable Python RAG backend to the Foundry module via local API. Add transcript logging to Foundry Journal. These are independent of vision — unblocks the core DM experience without waiting for camera work.

AI DM answers rules questions in session. Sessions are logged to Journal.

Weeks 4–5 · Stream E

Vision Scope Decision + Multi-modal Spike

Explicitly scope vision to passive camera capture fed as context (v1) vs active token manipulation (defer). Run the multi-modal spike — audio + camera frame simultaneously — before committing to M6 scope. This spike determines whether the vision timeline is 1 week or 4.

Vision scope documented. Spike result determines M6 plan.

Weeks 5–6

Demo Runbook + First Full Session

10-step setup runbook. Full end-to-end demo session with all integrated components. Recording captured as a sales artifact. Remaining gaps documented as v2 scope.

Runbook complete. Full demo session recorded and usable as a sales asset.

Investment

Engagement Cost

Pilot engagement at founder rate. Standard SME equivalent scope is $4,800–$6,000 CAD.

Item	Detail	Est. Cost
Agentic Readiness Assessment	Completed — multi-advisor review (CTO, CISO, COO)	$0 — Pilot
Implementation Advisory	6 weeks — weekly advisory + async support	$2,400 CAD
Agent-Assisted Dev Support	Claude Code API hours on Streams A–D	~$300 CAD
Total	Internal pilot rate	~$2,700 CAD

Expected Benefits

Before vs. After

Without KoalaSense

With KoalaSense

Time to first demo

Unknown — no validated JS path, timeline undefined

Time to first demo

2 weeks to smoke test, 6 weeks to full working demo

Demo repeatability

Founder-only, no runbook, cannot delegate

Demo repeatability

10-minute setup checklist, operable by anyone

Law 25 compliance

Blocked — no consent mechanism, not demo-safe

Law 25 compliance

Minimum viable consent gate in place for live demos

Sales readiness

Not demonstrable — module does not exist

Sales readiness

Live demo usable as flagship sales artifact

Methodology capture

Learnings stay in DnD-DM, never extracted

Methodology capture

Port discipline + failure taxonomy → Agentic-Integration playbook

Open Questions

Decisions needed before implementation

What is the exact vision scope — passive camera feed to Gemini context, or active Foundry scene token manipulation from camera detection?
Is a multi-modal spike (concurrent audio + video) planned before M6 scope is committed?
What is the intended data retention period for session transcripts and Foundry Journal entries?
Is Gemini Vertex AI data processing confirmed to stay in northamerica-northeast1 for Law 25?

Explicitly out of scope (v2)

Voice biometric speaker ID
Active AR token manipulation from camera
Multi-game system support beyond D&D 5e
Cloud deployment
Multi-player remote sessions

Platform Notes

KoalaSense Platform Gaps Identified

This engagement also served as a validation run for the KoalaSense platform itself (issue #40). Two skill gaps were surfaced.

Gap 1 — No Law 25 content in CISO skill

The ciso-advisor SKILL.md covers GDPR, CCPA, and HIPAA but has no Québec-specific Law 25 section. For a platform targeting Québec SMEs, this is a material gap. A PR is needed to add a Law 25 section with consent mechanism, data residency, retention policy, and 72-hour breach notification requirements.

Gap 2 — No engagement proposal synthesis skill

The Dify assessment DAG (issue #38) has a "Synthesis node" listed as producing a structured proposal document — but no engagement-proposal-writer skill backs it. The DAG has a phantom node. This proposal page is its reference implementation.

Universal AI Game Master— Foundry VTT Pivot

Universal AI Game Master
— Foundry VTT Pivot