General-Purpose AI Model Cracks Major Open Problem in Mathematics: A Milestone Moment Has Arrived

A general-purpose AI model solves a major open math problem, marking AI's shift from imitator to knowledge explorer.
OpenAI CEO Altman announced that a general-purpose AI model has solved a major open problem in mathematics. Unlike DeepMind's AlphaProof and other specialized systems, this breakthrough came from a general-purpose model, signaling that the gap between general intelligence and specialized expertise is narrowing. This milestone suggests AI reasoning capabilities are approaching or surpassing human expert levels, with potential cascading effects across physics, computer science, and other fields—though verifiability, interpretability, and generalization of AI proofs still warrant careful scrutiny.
A Historic Inflection Point
OpenAI CEO Sam Altman recently posted a brief but profoundly significant message on social media: A general-purpose model has solved a major open problem in mathematics.

This isn't a specialized system trained specifically for math competitions, nor a purpose-built tool meticulously fine-tuned for a particular problem—it's a general-purpose artificial intelligence model that achieved a breakthrough on a problem that human mathematicians had long failed to crack. Altman himself acknowledged this as a "pretty big milestone" and predicted that "we will say this sentence many times in the coming years."
From AlphaProof to General-Purpose Models: A Qualitative Leap
The Limitations of Specialized Systems
Looking back at AI's trajectory in mathematics, we've already witnessed impressive achievements. DeepMind's AlphaProof and AlphaGeometry demonstrated powerful capabilities on International Mathematical Olympiad (IMO)-level problems, but these systems are fundamentally specialized systems—carefully designed and trained to handle specific types of mathematical problems.
AlphaProof combines reinforcement learning with formal proof systems (the Lean proof assistant), translating natural language math problems into formal language and then using Monte Carlo tree search and similar algorithms to explore proof paths. AlphaGeometry is specifically designed for Euclidean geometry problems, combining neural networks with a symbolic reasoning engine. The core limitation of such systems lies in their "domain-locked" nature—training data, reward functions, and reasoning architectures are all highly tailored to specific mathematical subfields and cannot transfer to other types of open problems.
This breakthrough is fundamentally different. A general-purpose model means it wasn't built to solve any specific mathematical problem. It possesses broad language understanding, reasoning, and knowledge integration capabilities, yet in mathematics—a field demanding the utmost logical rigor—it solved an open problem that professional mathematicians had not yet cracked.
Why This Matters
An "open problem" in mathematics refers to a problem that has been formally posed, widely studied by the mathematical community, but remains unproven or unrefuted. These problems exist at different tiers of impact and difficulty: at the top are the seven Millennium Prize Problems established by the Clay Mathematics Institute, each carrying a million-dollar reward, of which six (including the Riemann Hypothesis) remain unsolved; next come the remaining portions of Hilbert's 23 problems, sub-problems within the Langlands program, and so on; there are also numerous open problems widely studied within specific mathematical branches but with lower public visibility, such as the Hadwiger conjecture in combinatorics and the Collatz conjecture in number theory. These problems often require entirely new approaches, creative constructions, or profound insights. When a general-purpose AI model achieves a breakthrough on such a problem, it implies several deep consequences:
- AI's reasoning capabilities are approaching or even surpassing certain human expert levels—at least in solving specific problems
- The gap between general intelligence and specialized expertise is narrowing—dedicated systems for each domain are no longer necessary
- AI's potential as a scientific research tool is being validated—transitioning from an assistive tool to an independent discoverer
Altman's "Complicated Feelings" Deserve Reflection
You might not have noticed, but while expressing excitement, Altman also admitted to having "complicated feelings" today. This ambivalence reflects a core tension in AI development:
On one hand, AI is enormously expanding humanity's understanding of the world, which is exhilarating. Mathematics is the foundational language of all sciences—if AI can continuously achieve breakthroughs at the mathematical frontier, the cascading effects will ripple through physics, computer science, engineering, and virtually every other field.
On the other hand, when machines begin demonstrating superhuman capabilities in the highest temple of human intellectual activity—pure mathematical research—it inevitably raises profound questions about human uniqueness, the meaning of mathematical research, and the nature of scientific discovery. Mathematician Timothy Gowers once distinguished between "illuminating proofs" and "verifying proofs"—the former reveals why a conclusion holds, while the latter merely confirms that it does. Historically, the computer-assisted proof of the Four Color Theorem (1976) sparked similar controversy: does a proof requiring a computer to exhaustively check 1,936 cases constitute genuine mathematical understanding? Mathematical proof has long been regarded as the pinnacle of human creativity and logical thinking. If AI can independently complete this process, we need to reexamine our definitions of "understanding" and "discovery."
Trend Predictions for the Coming Years
The Acceleration Loop Has Already Begun
Altman's statement that "we will say this sentence many times" is not an empty prediction. From a technological development perspective, several factors are forming a positive feedback loop:
- Continuous improvement in model capabilities: Each generation of large language models shows significant advances in reasoning ability. OpenAI's o1 series introduced a reinforcement learning training paradigm based on Chain-of-Thought, allowing models to perform internal "draft reasoning" before providing answers; the subsequent o3 model further expanded test-time compute scaling, allowing models to invest more computational resources in deep exploration on difficult problems. This "slow thinking" mechanism is functionally highly similar to how human mathematicians tackle hard problems through repeated attempts and backtracking verification, and represents the key technical foundation enabling general-purpose models to reach the mathematical frontier.
- Data flywheel effect: Every AI success in mathematics generates new training data and methodologies, further enhancing subsequent models' capabilities
- Deepening human-AI collaboration: Mathematicians will increasingly collaborate with AI, and this collaboration itself will catalyze new research paradigms
Impact on Mathematics and Science
In the short term, we may see the following changes:
- Mathematical journals and academic conferences will need to establish peer review standards for AI-assisted proofs
- More mathematicians will begin using AI as an everyday research tool
- Interdisciplinary research will accelerate, as AI can establish connections between different mathematical branches that humans struggle to perceive
- The focus of mathematics education may shift from computational techniques to problem formulation and intuition development
A Sober Perspective: Issues That Still Require Attention
While celebrating this milestone, we must also maintain prudence:
Verifiability: Mathematics has the advantage that its results can be rigorously verified. Notably, proof assistants such as Lean, Coq, and Isabelle can translate mathematical proofs into formally verifiable machine-checkable language, fundamentally eliminating the risk of oversights in manual review. The ongoing "formalization of mathematics" movement—such as the Lean community's Mathlib project, which has formalized tens of thousands of mathematical theorems—provides reliable verification infrastructure for AI proofs. When AI-generated proofs can be automatically converted to Lean code and pass type-checker verification, their correctness will achieve a higher level of certainty than human peer review. Whether AI-produced proofs can withstand such rigorous examination is a question the academic community must address first.
Interpretability: Can AI's proof process provide humans with genuine "understanding"? If AI "discovers" a proof through high-dimensional vector operations that humans cannot follow, even if the result is entirely correct, does it advance human understanding of mathematical structures? Is a correct but incomprehensible proof scientifically equivalent to an elegant human proof? This question is forcing the mathematical and philosophical communities to redefine the nature of "mathematical knowledge."
Generalization ability: Does solving one open problem mean AI possesses systematic mathematical research capability, or is it to some extent a "lucky strike"?
The answers to these questions will gradually become clear over the coming years. But regardless, today marks a turning point—AI is no longer merely an imitator of human intelligence; it is becoming an explorer at the frontiers of knowledge.
Key Takeaways
- OpenAI CEO Sam Altman announced that a general-purpose AI model has solved a major open problem in mathematics, marking a significant milestone in AI capabilities
- Unlike previous specialized mathematical AI systems, this breakthrough came from a general-purpose model, indicating that the gap between general intelligence and specialized expertise is narrowing
- Altman expressed complicated feelings, reflecting the deep questions raised when AI achieves breakthroughs in the highest temple of human intellect
- AI breakthroughs in mathematics could produce cascading effects across physics, computer science, and virtually all scientific fields
- Key issues around verifiability, interpretability, and generalization ability of AI proofs still require careful attention
Related articles
Tech FrontiersGitHub Agent HQ Launch: AI Coding Tools Enter the Era of Platform Competition
GitHub Universe unveils Agent HQ platform for unified coding agent management, Copilot upgrades with multi-model support. OpenAI completes restructuring, Anthropic tests new model, NVIDIA open-sources AI models.
Tech FrontiersGemini 3.5 Flash Achieves a Massive Leap on the GDPval Benchmark
Google Gemini 3.5 Flash surpasses Gemini 3.1 Pro on the GDPval benchmark. The lightweight Flash model leverages post-training techniques to approach frontier-level performance, redefining the balance between quality and cost.
Tech FrontiersGoogle Gemini Antigravity Weekly Quota Tripled — AI Coding Without Limits
Google Gemini triples Antigravity weekly quotas following a prior daily quota boost. Analyzing the impact on developers and its strategic significance in AI coding.