SketchBench

Visual Reasoning Benchmark

SketchBench

One model is given a hidden word and draws as SVG. The same model, but a different instance, is given a rendered JPEG of that SVG and guesses what that is (the hidden word).

  • Guesser has 20 tries.
  • Wordbank has 100 words and runs each word only once per run.
  • Accuracy is measured by how many drawings it got right.
  • We also measure cost, time, and avg guesses.

Examples

Gemini 3 Flash

Rocket Drawing
Guess Sequence
1. rocket