FAF Voice Crews — Mixed-Model Architecture

1,605 lines

1 HTML file

164 tests

5 voice WS

3 providers

4 modes

Layer 1: Browser (Single Tab — The Director)

Ara

Warm & friendly

WebSocket #1

Rex

Authority

WebSocket #2

Sal

Mediator

WebSocket #3

Eve

Wildcard

WebSocket #4

Leo

Commander

WebSocket #5

~50ms poll

CrewEngine (Browser JS)

Round-robin turn management • turn_detection: null on all 5 WS
Context injection to ALL voices after each turn ([Name said]: ...)
Round boundary detection triggers text analysts
Shared AudioContext (24kHz) • Per-voice GainNodes • AudioWorklet PTT

Modes: discussion / debate / interview / brainstorm • Speeds: quickfire (0ms) / natural (800ms) / ponder (2500ms)

Claude

Analyst • 1 sentence per round

HTTP POST /api/claude-chat.js

Anthropic

Gemini

Fact-Checker • 1 sentence per round

HTTP POST /api/gemini-chat.js

Google

↓ 5 voice WebSockets ↓ Claude HTTP ↓ Gemini HTTP

Layer 2: Vercel Edge Functions (Stateless)

crew-tokens.js

Fetches ephemeral tokens for N voices.
One xAI API call per voice, parallel.

xAI

claude-chat.js

Anthropic Messages API (Haiku).
System prompt = analyst role + crew soul.

Anthropic

gemini-chat.js

Gemini 2.5 Flash via @google/generative-ai.
System prompt = fact-checker role + crew soul.

Google

↓ xAI Realtime WS (wss://api.x.ai/v1/realtime) ↓ Anthropic API ↓ Gemini API

Layer 3: External APIs (3 Providers)

xAI Grok Realtime API

Model: grok-4-1-fast-non-reasoning
5 simultaneous WebSocket sessions
Per-voice audio stream (24kHz PCM16)
Tools: get_soul, write_soul (MCPaaS)

Anthropic Messages

Model: claude-3-5-haiku
Text only, async per round

Google Gemini

Model: gemini-2.5-flash
Text only, async per round

↓ ↑ All 3 providers read/write shared memory

Layer 4: Shared Memory (Cross-Provider)

MCPaaS • soul="crew-42"

Cloudflare KV persistent storage • Token: wolfe-68-orange
Voices read/write via xAI tool calls • Analysts receive soul context via Edge Functions
Beacon soul for live spectator broadcasting (crew-[timestamp])

Rich tags: voice:{name}, round:{n}, mode:{mode}, speed:{speed} • Entry types: voice-turn, fact, rule, memory

Data Flow: One Complete Round

1. CrewEngine triggers Voice N → response.create via WS
2. Voice N speaks (audio stream via WS) + transcript accumulates
3. response.done fires → waitForAudioDone() drains audio queue
4. onVoiceDone: add to transcript, publish to beacon, inject [Name said] to ALL other voices
5. currentTurn++ → roundCount = floor(turn / voiceCount)
6. If round boundary: callAnalyst('claude') + callAnalyst('gemini') — async, non-blocking
7. After pauseMs delay, trigger next voice → loop to step 1
8. Claude response arrives (1-2s) → addMessage + beaconPublish
9. Gemini response arrives (1-2s) → addMessage + beaconPublish

Engine / FAF

Voice (xAI Grok)

Claude (Anthropic)

Gemini (Google)

MCPaaS Memory

Edge Functions

← Crews • FAF Voice Crews • Exaptation • Use Cases • Strategy • RadioFAF • Source