Grok's realtime voice is now on AI Gateway. Build with AI SDK 7:
• 𝚡𝚊𝚒/𝚐𝚛𝚘𝚔-𝚟𝚘𝚒𝚌𝚎-𝚝𝚑𝚒𝚗𝚔-𝚏𝚊𝚜𝚝-𝟷.𝟶 (𝚞𝚜𝚎𝚁𝚎𝚊𝚕𝚝𝚒𝚖𝚎)
• 𝚡𝚊𝚒/𝚐𝚛𝚘𝚔-𝚝𝚝𝚜 (𝚐𝚎𝚗𝚎𝚛𝚊𝚝𝚎𝚂𝚙𝚎𝚎𝚌𝚑)
• 𝚡𝚊𝚒/𝚐𝚛𝚘𝚔-𝚜𝚝𝚝 (𝚝𝚛𝚊𝚗𝚜𝚌𝚛𝚒𝚋𝚎)
QVAC SDK 0.14.0 is live.
This release makes the on-device stack faster on mobile, ships the developer-agent path, and takes local text-to-speech to 31 languages.
Main highlights:
- OpenCode and OpenClaw. The first official OpenCode plugin, plus a maintained OpenClaw compatibility path, both built on managed mode and qvac serve. Point a coding agent at a local model with far less setup and far fewer surprises.
- Brain-computer interface transcription, on the SDK. Take recorded neural signal data and decode it into text, fully on-device, no cloud. Stream it in chunks through a simple API. In 0.14 it runs GPU-accelerated on iOS.
- Text to Speech in 31 languages with our Supertonic3 upgrade.
VOICE AND SPEECH
- Supertonic3 multilingual TTS, 5 languages to 31.
- Chatterbox and Supertonic now run on the Android GPU, with lower memory use (especially on iOS), quantized s3gen Chatterbox support, and a fix for Chatterbox occasionally emitting random speech.
- Whisper transcription now runs on the iOS GPU. Parakeet runs on the Android GPU, with steadier real-time streaming.
VISION AND OCR
- VLM multi-tile batching: high-resolution Pan and Scan images are encoded in one pass instead of tile by tile, for faster vision throughput.
- OCR on ggml (EasyOCR and DocTR) reaches full speed parity with the onnx path, across Metal, OpenCL, and Vulkan.
PLATFORM AND RELIABILITY
- Dynamic compute backends on Linux: one build picks the right backend at runtime, and opens the door to ROCm and CUDA support without per-backend builds.
- Thinking tokens are kept out of the model context, so reasoning no longer fills the KV cache.
SDK 0.14.0 is now leaner and faster to start.
Let’s build.
Grok TTS is already sounding insanely human
In Vapi’s blind voting Humaneness Index, Grok TTS ranked as the top AI voice model in the chart with a humaneness score of 96.....just 4 points below the real human benchmark
• Top AI voice model shown
• 96/100 humaneness score
• Only 4 points behind the human benchmark
What makes this even more impressive is that Grok TTS is combining natural-sounding speech with low latency and aggressive pricing
The gap between AI-generated speech and real human voices is disappearing faster than most people realize
Grok is starting to speak like a real person