Simon Willison (@simonw) “Microsoft's MIT licensed VibeVoice speech-to-text model (think Whisper with spea”

Simon Willison

@simonw

Creator @datasetteproj, co-creator Django. PSF board. Hangs out with @natbat. He/Him. Mastodon: Bsky:

加入 November 2006

5.6K 正在关注 182.2K 粉丝

Simon Willison@simonw

2026.04.27 23:48

Microsoft's MIT licensed VibeVoice speech-to-text model (think Whisper with speaker diarization) is really good - my notes on running the 5.71GB 4bit MLX conversion on an M5 MacBook, using about 60GB of RAM at peak and transcribing 1hr of audio in ~9 mins

显示更多