
Based on 1 review
OpenAI's most powerful reasoning model, replacing o3-pro. Uses extended compute for the hardest scientific and mathematical problems. Achieves 88.4% on GPQA Diamond and sets state-of-the-art on graduate-level science benchmarks.
Released
August 15, 2025
Parameters
Unknown
Context
400K
Pricing
Enterprise
| Benchmark | Category | Score | Performance |
|---|---|---|---|
MMLU | knowledge | 93.1% | 93 |
HumanEval | coding | 95.2% | 95 |
MATH | reasoning | 96.8% | 97 |
Chatbot Arena ELO | overall | 1460 | 94 |
Last updated: March 15, 2026
Benchmark scores may vary based on evaluation methodology and conditions.