Same question, two ways. The bare model answers from memory and sounds just as confident when it's wrong. The grounded one cites the exact record — or hits an honest dry hole.
Grounded through Drillable+ verified source
A regular AI on its ownno source
The grounded small model lands the specific fact where the bare frontier model improvises one. This is a small in-house eval — single domain, n≈6 — not a published benchmark, so the box above is the real test. Try to break it.