By veeranuch — 15 Aug 2025

GPT-5 Thinking, GPT Deep Research, GPT Agent Mode… Which One Should You Use?

ภาษาอื่น / Other language: English · ไทย

After finding in my earlier test that GPT-5 outperformed Grok 4 — and based on a lot of casual use without keeping screenshots — I started wondering:
If I’m dropping Grok 4, what other options should I seriously consider?
If I’m bringing in a new AI assistant, I need to put it through a proper “job interview” before letting it join my workflow.

This test focuses on business research quality — gathering company information, analyzing it, and presenting it in a useful format.

🔹 Test Setup

Task: Business research on a company I provided
Same prompt & custom instructions for all models.

Models tested (Reports):

Grok-4 (already on Supergrok, so it got a turn)
Claude Opus 4.1
Gemini 2.5 Pro
GPT Agent Mode
GPT Deep Research
GPT-5 Thinking

Judges:
Grok 3, Claude Sonnet 4, Gemini 2.5 Flash, GPT-5, DeepSeek
(Why smaller/faster models? Because each report was over 10 pages — the heavy models would take forever to read and score.)

The reality of testing…

Some models struggled with even reviewing the files:

Grok 4 froze during review
Gemini hit its pro quota before finishing
Claude Opus burned through its quota quickly, so I had to switch to Sonnet 4
Grok 3 only saw 2 of 10 pages I gave it. I had to try again many times.
Gemini sometimes failed to see the file until I restarted the chat

In short: just getting all six compared took hours.

🔹 Results

Rank	Model	Score	Strengths	Best For
🥇 1	GPT-5 Thinking	9.14/10 🏆	Strategic + actionable, concise structure, varied & verifiable sources	Executives needing short but complete strategic reports
🥈 2	Gemini 2.5 Pro	8.88/10	Deep strategic insight, well-organized, clear competitor comparison	Market analysts, strategy consultants
🥉 3	GPT Agent Mode	8.70/10	Detailed asset breakdowns, professional structure, systematic competitor analysis	Consultants and operational teams needing deep dives
–	GPT Deep Research	8.42/10	Very detailed, covers history + market, good future trend analysis	Research teams (too long for execs)
–	Claude Opus 4.1	8.10/10	Good storytelling, balanced structure, boardroom-friendly	Exec presentation prep (less quantitative focus)
–	Grok-4	7.02/10	Solid basics, numbers and awards included	Quick background checks (light on strategy)

🔹 Takeaways

Goodbye, Supergrok — I won’t be renewing next month.
If your daily workload isn’t huge, GPT-5 Thinking or Gemini 2.5 Pro are strong free options.
Claude is pricey — even with a subscription, I barely got to use Opus 4.1. Its strength is in descriptive writing, not heavy quantitative analysis.

📷 Supporting screenshots are at the end of this post.

First Published (in Thai): 10 Aug 2025
Translated to English by GPT-5