GPT-5 Thinking, GPT Deep Research, GPT Agent Mode… Which One Should You Use?

ภาษาอื่น / Other language: English · ไทย
After finding in my earlier test that GPT-5 outperformed Grok 4 — and based on a lot of casual use without keeping screenshots — I started wondering:
If I’m dropping Grok 4, what other options should I seriously consider?
If I’m bringing in a new AI assistant, I need to put it through a proper “job interview” before letting it join my workflow.
This test focuses on business research quality — gathering company information, analyzing it, and presenting it in a useful format.
🔹 Test Setup
Task: Business research on a company I provided
Same prompt & custom instructions for all models.
Models tested (Reports):
- Grok-4 (already on Supergrok, so it got a turn)
- Claude Opus 4.1
- Gemini 2.5 Pro
- GPT Agent Mode
- GPT Deep Research
- GPT-5 Thinking
Judges:
Grok 3, Claude Sonnet 4, Gemini 2.5 Flash, GPT-5, DeepSeek
(Why smaller/faster models? Because each report was over 10 pages — the heavy models would take forever to read and score.)
The reality of testing…
Some models struggled with even reviewing the files:
- Grok 4 froze during review
- Gemini hit its pro quota before finishing
- Claude Opus burned through its quota quickly, so I had to switch to Sonnet 4
- Grok 3 only saw 2 of 10 pages I gave it. I had to try again many times.
- Gemini sometimes failed to see the file until I restarted the chat
In short: just getting all six compared took hours.
🔹 Results
Rank | Model | Score | Strengths | Best For |
---|---|---|---|---|
🥇 1 | GPT-5 Thinking | 9.14/10 🏆 | Strategic + actionable, concise structure, varied & verifiable sources | Executives needing short but complete strategic reports |
🥈 2 | Gemini 2.5 Pro | 8.88/10 | Deep strategic insight, well-organized, clear competitor comparison | Market analysts, strategy consultants |
🥉 3 | GPT Agent Mode | 8.70/10 | Detailed asset breakdowns, professional structure, systematic competitor analysis | Consultants and operational teams needing deep dives |
– | GPT Deep Research | 8.42/10 | Very detailed, covers history + market, good future trend analysis | Research teams (too long for execs) |
– | Claude Opus 4.1 | 8.10/10 | Good storytelling, balanced structure, boardroom-friendly | Exec presentation prep (less quantitative focus) |
– | Grok-4 | 7.02/10 | Solid basics, numbers and awards included | Quick background checks (light on strategy) |
🔹 Takeaways
- Goodbye, Supergrok — I won’t be renewing next month.
- If your daily workload isn’t huge, GPT-5 Thinking or Gemini 2.5 Pro are strong free options.
- Claude is pricey — even with a subscription, I barely got to use Opus 4.1. Its strength is in descriptive writing, not heavy quantitative analysis.
📷 Supporting screenshots are at the end of this post.
First Published (in Thai): 10 Aug 2025
Translated to English by GPT-5


Summary by GPT-5









Research capabilities assessment