One question, two answers. A model answers from memory — just as confident when it's wrong. The grounded one cites the record, or hits a dry hole.
Grounded through Drillable+ verified source
A model on its ownno source
The grounded small model lands the fact the bare frontier model improvises. A small in-house eval — one domain, n≈6, not a published benchmark — so the box above is the real test. Try to break it.