We achieved 91.9% on GDPval benchmark with Muse AI (GPT-5.5), achieving top-tier performance on this benchmark.
GDPval is a benchmark designed to evaluate AI systems on real-world professional tasks across 44 occupations and 9 major industries that contribute significantly to GDP.
On this benchmark, Muse consistently outperforms leading systems including Codex CLI, Claude Opus 4.7, and Gemini 3.1 Pro.
We believe a professional AI assistant should not only generate answers, but also reliably help users solve real work problems and get things done. It should create tangible impact in users’ daily work by helping them move tasks forward, rather than just responding with information.
Try Muse at
显示更多