Stay up to date with the latest from the world of AI -- 16 articles
Anthropic's latest flagship model Claude Opus 4.6 achieves state-of-the-art performance on SWE-bench, GPQA, and MATH benchmarks, cementing its position as the most capable coding assistant available.
OpenAI's o3 reasoning model scores 96.7% on the ARC-AGI benchmark, a test designed to measure general intelligence capabilities, reigniting debate about the path to artificial general intelligence.
Google's Gemini 2.5 Flash introduces a thinking budget feature that lets developers control the trade-off between response speed and reasoning depth, achieving near-Pro quality at a fraction of the latency.
OpenAI releases GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, all featuring a massive 1 million token context window and significant improvements in coding and instruction following.
Meta's Llama 4 family brings mixture-of-experts architecture to open-source LLMs, with Maverick featuring 400B total parameters and Scout offering a 10M token context window.
AI-powered search engine Perplexity AI crosses the 100 million monthly active user mark, signaling a fundamental shift in how users discover and consume information online.
Gemini 2.5 Pro debuts at number one on LMArena with a novel hybrid thinking architecture that allows users to control reasoning depth on a per-query basis.
A bipartisan group of senators introduces legislation requiring AI companies to disclose training data sources, model capabilities assessments, and known limitations before deploying models above a compute threshold.
Google releases Gemma 3 in four sizes (1B to 27B parameters) with 128K context and multimodal support, designed to run efficiently on consumer hardware from smartphones to gaming PCs.
Step-by-step tutorial on building production-grade retrieval-augmented generation systems using Claude 3.5 Sonnet, covering chunking strategies, embedding models, and evaluation frameworks.
Mistral's Codestral surpasses GitHub Copilot in VS Code Marketplace installs for the first time, driven by its strong performance on multi-file editing and its permissive licensing for commercial use.
An analysis of enterprise AI adoption trends showing that open-weight models like Llama, Mistral, and DeepSeek are gaining market share due to data sovereignty concerns and total cost of ownership advantages.
Anthropic publishes a major update to its Constitutional AI training methodology, demonstrating improved safety alignment without sacrificing capability on downstream tasks.
Elon Musk's xAI releases Grok 3, trained on the Memphis supercluster with 200,000 GPUs. The model achieves top rankings on Chatbot Arena and introduces a Deep Think reasoning mode.