1,605 lines
1 file
164 tests
0 frameworks
3 AI providers
N voices
xAI Grok Realtime API — Designed vs Exapted
| xAI Designed |
We Built |
The Exaptation |
| 1 WebSocket per user |
5 simultaneous WebSockets, one browser (N-voice architecture, 5 = xAI's full roster) |
Superlinear: connection density Nx |
| 1 voice per session |
5 distinct personas today, architecture is N-voice |
Multi-voice from single-voice API |
| Server-side VAD manages turns |
turn_detection: null — browser is the director |
First-principles: client-side orchestration, zero server state |
| Human talks to AI |
AI talks to AI, human optional |
Exaptation: reversed the entire paradigm |
| Single audio output stream |
Shared AudioContext, per-voice GainNodes |
Audio multiplexing in the browser |
| Ephemeral token → 1 session |
N tokens fetched in parallel → N sessions |
Token parallelism Nx |
| Tool calls for user tasks |
Tool calls write to team memory (MCPaaS) |
Cross-session persistence via existing tools |
| Stateless sessions, no memory |
Persistent team memory — return tomorrow, the team remembers everything. Ask about previous chats, recall facts, build on history. |
Eternal memory: zero drift, zero re-explain ∞ |
| Single-provider ecosystem |
xAI voice + Claude text + Gemini text in one tab |
Overdriving: 3 providers, 1 browser 3x |
7 Firsts — Competitive Landscape
| Capability |
Closest |
Their Limit |
Voice Crews |
| Multi-voice AI conversation |
Character.AI Rooms |
Text only, no voice |
5 real-time voices, live audio |
| Multi-provider collaboration |
Multiple.chat |
Side-by-side, not collaborative |
3 providers, 1 conversation |
| Parallel voice WebSockets |
Deepgram |
Scaling users, not personas |
5 persona streams, 1 browser |
| Browser-as-director |
Pipecat / LiveKit |
Server-side orchestration |
Client-side turns, no server |
| Eternal team memory |
Sesame (Maya) |
1 provider, 2-week window, then gone |
3 providers, persistent forever, pick up where you left off |
| Live spectator mode |
Character.AI |
Text logs after the fact |
Real-time streaming via beacon |
| Mixed-modality crews |
OpenAI supervisor |
Text model is hidden tool |
Named text + voice participants |
The Grok Realtime API was built for 1 human talking to 1 AI.
We used it for 5 AIs talking to each other + optional human.
That's not a bug in the API. That's proof of great infrastructure.
"When your API enables things you didn't design for, you built infrastructure, not just a product."