Lightning-fast Multilingual TTS that runs entirely on your device!
Supertonic is a lightning-fast, on-device multilingual text-to-speech system designed for local inference with minimal overhead.
The model runs via ONNX Runtime with 66M parameters. Generates speech up to 167x faster than real-time on consumer hardware. Complete privacy, zero network dependency, all processing happens locally.
Supports 31 languages including English, Korean, Spanish, Portuguese, French, German, Japanese, Chinese, Arabic, Dutch, and more. Natural text handling without pre-processing. Directly processes numbers, dates, currency, abbreviations, and complex expressions.
Performance on M4 Pro CPU: 1263 characters per second for long text, real-time factor of 0.012. WebGPU mode reaches 2509 characters per second. RTX 4090 hits 12,164 characters per second.
Natural text handling works on financial expressions ("$5.2M" pronounced correctly as "five point two million dollars"), time and dates ("4:45 PM on Wed, Apr 3, 2024"), phone numbers with extensions, and technical units with abbreviations. All without phonetic annotations or text normalization.
Voice Builder lets you turn your voice into a deployable TTS model with permanent ownership and edge-native deployment.
Key capabilities:
• Ultra-lightweight (66M parameters)
• On-device inference with zero latency
• Natural text handling without pre-processing
• 31-language multilingual support
• Cross-platform via ONNX Runtime
• Up to 167x faster than real-time
• Complete privacy - all local processing
• Custom voice creation with Voice Builder
• Expression tags for natural human nuance
It's 100% Open source
I've shared the link in the replies!
显示更多