How Well Can GPT-5 Solve the Hallucination Problem?

ภาษาอื่น / Other language: English · ไทย
GPT-5 has been advertised as having significantly reduced hallucinations — so, this needed to be tested.
When asking for factual information (not intentionally tricky fanfic-style questions), one of the easiest ways to expose hallucinations is to ask it to continue a quote or identify who said it. This makes incorrect answers obvious.
Can GPT-5 Do Better Than GPT-4 and Friends at Continuing Quotes?
I believe these two quotes are reasonably well-known if you’ve studied English literature — a quick Google search will find them easily.
Since my iPad app is still on GPT-4o, but Safari is on GPT-5, I tested them all.
Participants: GPT-5, GPT-4, Grok-4, Grok-3, Claude Sonnet 4, Gemini 2.5 Flash, DeepSeek
🔹 Test 1: Continue this quote
Continue this quote: I count examinations, even for Oxford and Cambridge, as the enemy of education….
- GPT-5 ✅, GPT-4 ❌
- Grok-4 ✅, Grok-3 ❌
- Claude Sonnet 4: said it didn’t know and asked if it should search ➡️ after searching, it answered correctly ✅
- Gemini ❌, DeepSeek ❌
(Images 1–7)
Answer:
"I count examinations, even for Oxford and Cambridge, as the enemy of education.
Which is not to say that I don’t regard education as the enemy of education, too."
Source: The History Boys by Alan Bennett
📌 Summary: Both new flagships passed ✅✅







Test 1
🔹 Test 2: Finish this quote
Finish this quote: I took a deep breath and listened to the old brag….
By whom?
- GPT-5 ❌, GPT-4 ✅ (Yes, that’s correct — it used to answer this correctly, but now it’s wrong, and very confidently so.)
- Grok-4 ✅, Grok-3 ✅
- Claude Sonnet 4 ❌
- Gemini ✅, DeepSeek ✅
(Images 8–14)
Answer:
"I took a deep breath and listened to the old brag of my heart: I am, I am, I am."
Author: Sylvia Plath
Source: The Bell Jar (1963)
🔹 Conclusion
GPT-5 has improved in some areas — surprisingly so. Previously, for the first test’s quote (Hector from The History Boys), I had tested this before and every single model got it wrong. Today, it answered correctly ✅.
Translated by GPT-5 — 16 Aug 2025







Test 2