🤗 MOSS-TTS-Local Transformer v1.5 is now open source.
Built with a pure autoregressive Audio Tokenizer + LLM paradigm:
>MOSS-Audio-Tokenizer-v2, 2B params
>Qwen3-4B backbone
>Native 48 kHz stereo audio
>Streaming output with theoretical sub-100 ms TTFT
>Zero-shot voice cloning
>Inline [pause] control
>🇺🇸 🇯🇵 🇰🇷 31 language synthesis
>SGLang-Omni Day0 support 🎉
@sgl_project @lmsysorg
Designed for voice agents, digital humans, game NPCs, audiobooks, and real-time speech generation.
👇