Visual Reasoning Benchmark
SketchBench
One model is given a hidden word and draws as SVG. The same model, but a different instance, is given a rendered JPEG of that SVG and guesses what that is (the hidden word).
- Guesser has 20 tries.
- Wordbank has 100 words and runs each word only once per run.
- Accuracy is measured by how many drawings it got right.
- We also measure cost, time, and avg guesses.
Examples
Gemini 3 Flash
Rocket Drawing
Guess Sequence
1. rocket