AI News

Stay up to date with the latest from the world of AI -- 16 articles

AllModel ReleasesResearchIndustryPolicyTutorialsOpinionFundingOpen Source

Featured

Claude Opus 4.6 Sets New Benchmark Records Across Coding and Reasoning

Anthropic's latest flagship model Claude Opus 4.6 achieves state-of-the-art performance on SWE-bench, GPQA, and MATH benchmarks, cementing its position as the most capable coding assistant available.

Anthropic BlogMay 15, 2025

Research

OpenAI o3 Achieves 96.7% on ARC-AGI Benchmark

OpenAI's o3 reasoning model scores 96.7% on the ARC-AGI benchmark, a test designed to measure general intelligence capabilities, reigniting debate about the path to artificial general intelligence.

Ars TechnicaApr 20, 2025

Model Release

Gemini 2.5 Flash Redefines the Speed-Intelligence Trade-off

Google's Gemini 2.5 Flash introduces a thinking budget feature that lets developers control the trade-off between response speed and reasoning depth, achieving near-Pro quality at a fraction of the latency.

Google BlogApr 17, 2025

Model Release

OpenAI Launches GPT-4.1 Family with 1M Token Context Window

OpenAI releases GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, all featuring a massive 1 million token context window and significant improvements in coding and instruction following.

OpenAI BlogApr 14, 2025

Open Source

Meta Releases Llama 4 Maverick and Scout as Open Source

Meta's Llama 4 family brings mixture-of-experts architecture to open-source LLMs, with Maverick featuring 400B total parameters and Scout offering a 10M token context window.

Meta AI BlogApr 5, 2025

Industry

Perplexity AI Hits 100M Monthly Active Users as AI Search Goes Mainstream

AI-powered search engine Perplexity AI crosses the 100 million monthly active user mark, signaling a fundamental shift in how users discover and consume information online.

BloombergApr 1, 2025

Model Release

Google DeepMind Unveils Gemini 2.5 Pro with Hybrid Thinking

Gemini 2.5 Pro debuts at number one on LMArena with a novel hybrid thinking architecture that allows users to control reasoning depth on a per-query basis.

Google BlogMar 25, 2025

Policy

US Senate Introduces Bipartisan AI Transparency Act

A bipartisan group of senators introduces legislation requiring AI companies to disclose training data sources, model capabilities assessments, and known limitations before deploying models above a compute threshold.

ReutersMar 20, 2025

Open Source

Google Releases Gemma 3 Open Weights for On-Device AI

Google releases Gemma 3 in four sizes (1B to 27B parameters) with 128K context and multimodal support, designed to run efficiently on consumer hardware from smartphones to gaming PCs.

Google BlogMar 12, 2025

Tutorial

A Practical Guide to Building RAG Pipelines with Claude 3.5 Sonnet

Step-by-step tutorial on building production-grade retrieval-augmented generation systems using Claude 3.5 Sonnet, covering chunking strategies, embedding models, and evaluation frameworks.

Anthropic DocsMar 10, 2025

Industry

Codestral Becomes the Most Popular Code Completion Model on VS Code Marketplace

Mistral's Codestral surpasses GitHub Copilot in VS Code Marketplace installs for the first time, driven by its strong performance on multi-file editing and its permissive licensing for commercial use.

TechCrunchMar 5, 2025

Opinion

Why Open-Weight Models Are Winning the Enterprise AI Race

An analysis of enterprise AI adoption trends showing that open-weight models like Llama, Mistral, and DeepSeek are gaining market share due to data sovereignty concerns and total cost of ownership advantages.

The EconomistMar 1, 2025

Research

Anthropic Introduces Constitutional AI 2.0 for Safer Model Training

Anthropic publishes a major update to its Constitutional AI training methodology, demonstrating improved safety alignment without sacrificing capability on downstream tasks.

Anthropic BlogFeb 28, 2025

Model Release

xAI's Grok 3 Tops Chatbot Arena with Massive Compute Investment

Elon Musk's xAI releases Grok 3, trained on the Memphis supercluster with 200,000 GPUs. The model achieves top rankings on Chatbot Arena and introduces a Deep Think reasoning mode.

WiredFeb 18, 2025