Grok 4 vs GPT-5 — Which One Does Better Research?

Grok 4 vs GPT-5 — Which One Does Better Research?

ภาษาอื่น / Other language: English · ไทย

Today I had the chance to try GPT-5, so I had to test its capabilities.
Since I usually have it search for information, compile data, and summarize reports, this comparison focuses on that use case.

Test Method

Test subjects: Vanilla Grok 4 vs Vanilla GPT-5
Evaluators: GPT-5, Grok 4, Gemini, Claude, DeepSeek
Items evaluated: Research 1 (Grok 4) & Research 2 (GPT-5)

Steps

  1. Assign the same task using the same prompt and the same custom instructions
    (Task: Search the internet and summarize the oil market)
  2. Have all 5 LLMs evaluate which result is better
  3. Have GPT-5 summarize the evaluation (full summary shown in the accompanying images)

💡 After several hours of use I found that both models take longer to think than their predecessors,
but GPT-5 outperformed Grok 4 unanimously.

Looks like I won’t be renewing Supergrok next month.


📊 Peer Review Summary: Grok-4 (Research 1) vs GPT-5 (Research 2)

By GPT-5

✅ Points of agreement among all evaluators

  • Research 2 excelled in clear source citations with credibility tiers.
  • Research 2 contained more detailed and up-to-date information, e.g., 2024 monthly figures and H1-2025 trends.
  • Research 2 used a balanced Singapore/Hong Kong comparison structure that made side-by-side review easier.
  • Research 2 was ready for practical use, including benchmarks, supplier lists, and appendices.
  • Research 1 stood out in bias detection and transparency, with Confidence tiers and Reflective pauses.
  • Research 1 was more concise and sharper in its writing style.

📊 Score-based assessment

  • Research 2: Highest scores in citation quality, detail/freshness of information, comparison structure, and practical readiness.
  • Research 1: High scores in methodological transparency and bias control.

Weighted overall score:

  • Research 2 ≈ 4.65/5
  • Research 1 ≈ 3.50/5

Even with heavier weighting on “Bias Control,” Research 2 still led.


🏆 Conclusion

  • Winner: Research 2 (GPT-5) — more detailed, more up-to-date, easier to verify, and ready for practical application.
  • Research 1 (Grok 4) is suitable when transparency and conciseness are the main priorities, but it falls behind in freshness and the volume of recent figures.

This post was put together quickly because it’s launch day — I wanted to test and capture what “day one” looked like.
There may be limitations in the amount and coverage of data, so please treat this as a historical snapshot for future reference.

First Published (in Thai): 8 Aug 2025
Translated to English by GPT-5

ภาษาอื่น / Other language: English · ไทย