GPT-5 Thinking, GPT Deep Research, GPT Agent Mode… Which One Should You Use?

GPT-5 Thinking, GPT Deep Research, GPT Agent Mode… Which One Should You Use?

ภาษาอื่น / Other language: English · ไทย

After finding in my earlier test that GPT-5 outperformed Grok 4 — and based on a lot of casual use without keeping screenshots — I started wondering:
If I’m dropping Grok 4, what other options should I seriously consider?
If I’m bringing in a new AI assistant, I need to put it through a proper “job interview” before letting it join my workflow.

This test focuses on business research quality — gathering company information, analyzing it, and presenting it in a useful format.


🔹 Test Setup

Task: Business research on a company I provided
Same prompt & custom instructions for all models.

Models tested (Reports):

  1. Grok-4 (already on Supergrok, so it got a turn)
  2. Claude Opus 4.1
  3. Gemini 2.5 Pro
  4. GPT Agent Mode
  5. GPT Deep Research
  6. GPT-5 Thinking

Judges:
Grok 3, Claude Sonnet 4, Gemini 2.5 Flash, GPT-5, DeepSeek
(Why smaller/faster models? Because each report was over 10 pages — the heavy models would take forever to read and score.)


The reality of testing…

Some models struggled with even reviewing the files:

  • Grok 4 froze during review
  • Gemini hit its pro quota before finishing
  • Claude Opus burned through its quota quickly, so I had to switch to Sonnet 4
  • Grok 3 only saw 2 of 10 pages I gave it. I had to try again many times.
  • Gemini sometimes failed to see the file until I restarted the chat

In short: just getting all six compared took hours.


🔹 Results

Rank Model Score Strengths Best For
🥇 1 GPT-5 Thinking 9.14/10 🏆 Strategic + actionable, concise structure, varied & verifiable sources Executives needing short but complete strategic reports
🥈 2 Gemini 2.5 Pro 8.88/10 Deep strategic insight, well-organized, clear competitor comparison Market analysts, strategy consultants
🥉 3 GPT Agent Mode 8.70/10 Detailed asset breakdowns, professional structure, systematic competitor analysis Consultants and operational teams needing deep dives
GPT Deep Research 8.42/10 Very detailed, covers history + market, good future trend analysis Research teams (too long for execs)
Claude Opus 4.1 8.10/10 Good storytelling, balanced structure, boardroom-friendly Exec presentation prep (less quantitative focus)
Grok-4 7.02/10 Solid basics, numbers and awards included Quick background checks (light on strategy)

🔹 Takeaways

  1. Goodbye, Supergrok — I won’t be renewing next month.
  2. If your daily workload isn’t huge, GPT-5 Thinking or Gemini 2.5 Pro are strong free options.
  3. Claude is pricey — even with a subscription, I barely got to use Opus 4.1. Its strength is in descriptive writing, not heavy quantitative analysis.

📷 Supporting screenshots are at the end of this post.


First Published (in Thai): 10 Aug 2025
Translated to English by GPT-5

ภาษาอื่น / Other language: English · ไทย