1/ Today we're launching Cua-Bench with
@SnorkelAI: a benchmark for computer-use agents on professional software, open for any model to run. The benchmark covers 25 expert-authored KiCad tasks, and the best frontier model we tested cleared only 6 of them.