FAF Voice Crews — Mixed-Model Architecture

3 Providers • 7 Concurrent Connections • 0 Frameworks • Launch Crews →
1,605 lines
1 HTML file
164 tests
5 voice WS
3 providers
4 modes
Layer 1: Browser (Single Tab — The Director)
Ara
Warm & friendly
WebSocket #1
Rex
Authority
WebSocket #2
Sal
Mediator
WebSocket #3
Eve
Wildcard
WebSocket #4
Leo
Commander
WebSocket #5
~50ms poll
CrewEngine (Browser JS)
Round-robin turn management • turn_detection: null on all 5 WS
Context injection to ALL voices after each turn ([Name said]: ...)
Round boundary detection triggers text analysts
Shared AudioContext (24kHz) • Per-voice GainNodes • AudioWorklet PTT
Modes: discussion / debate / interview / brainstorm • Speeds: quickfire (0ms) / natural (800ms) / ponder (2500ms)
Claude
Analyst • 1 sentence per round
HTTP POST /api/claude-chat.js
Anthropic
Gemini
Fact-Checker • 1 sentence per round
HTTP POST /api/gemini-chat.js
Google
5 voice WebSockets Claude HTTP Gemini HTTP
Layer 2: Vercel Edge Functions (Stateless)
crew-tokens.js
Fetches ephemeral tokens for N voices.
One xAI API call per voice, parallel.
xAI
claude-chat.js
Anthropic Messages API (Haiku).
System prompt = analyst role + crew soul.
Anthropic
gemini-chat.js
Gemini 2.5 Flash via @google/generative-ai.
System prompt = fact-checker role + crew soul.
Google
xAI Realtime WS (wss://api.x.ai/v1/realtime) Anthropic API Gemini API
Layer 3: External APIs (3 Providers)
xAI Grok Realtime API
Model: grok-4-1-fast-non-reasoning
5 simultaneous WebSocket sessions
Per-voice audio stream (24kHz PCM16)
Tools: get_soul, write_soul (MCPaaS)
Anthropic Messages
Model: claude-3-5-haiku
Text only, async per round
Google Gemini
Model: gemini-2.5-flash
Text only, async per round
↓ ↑ All 3 providers read/write shared memory
Layer 4: Shared Memory (Cross-Provider)
MCPaaS • soul="crew-42"
Cloudflare KV persistent storage • Token: wolfe-68-orange
Voices read/write via xAI tool calls • Analysts receive soul context via Edge Functions
Beacon soul for live spectator broadcasting (crew-[timestamp])
Rich tags: voice:{name}, round:{n}, mode:{mode}, speed:{speed} • Entry types: voice-turn, fact, rule, memory
Data Flow: One Complete Round
1. CrewEngine triggers Voice N → response.create via WS
2. Voice N speaks (audio stream via WS) + transcript accumulates
3. response.done fires → waitForAudioDone() drains audio queue
4. onVoiceDone: add to transcript, publish to beacon, inject [Name said] to ALL other voices
5. currentTurn++ → roundCount = floor(turn / voiceCount)
6. If round boundary: callAnalyst('claude') + callAnalyst('gemini') — async, non-blocking
7. After pauseMs delay, trigger next voice → loop to step 1
8. Claude response arrives (1-2s) → addMessage + beaconPublish
9. Gemini response arrives (1-2s) → addMessage + beaconPublish
Engine / FAF
Voice (xAI Grok)
Claude (Anthropic)
Gemini (Google)
MCPaaS Memory
Edge Functions
← Crews • FAF Voice Crews • ExaptationUse CasesStrategyRadioFAFSource