DeepSeek R1 Demonstrates That Reasoning Can Be Learned Through Pure RL
arXiv·(Jan 22, 2025)
DeepSeek's R1 model shows that large-scale reinforcement learning alone can produce emergent reasoning capabilities without supervised fine-tuning on chain-of-thought data.