Gate News message, April 24 — DeepSeek V4 has published results from formal mathematical reasoning evaluations, achieving a perfect score of 120/120 on Putnam-2025, tying with Axiom for first place.
In the practical regime using LeanExplore and constrained sampling, V4-Flash-Max scored 81.00 on the Putnam-200 Pass@8 benchmark, significantly outperforming Seed-2.0-Prover (35.50), Gemini 3 Pro (26.50), and Seed-1.5-Prover (26.50). The frontier regime results showed V4 ahead of Seed-1.5-Prover (110/120) and Aristotle (100/120).
V4 employs a hybrid formal-informal reasoning approach: informal reasoning generates candidate natural language solutions, self-verification filters the results, and a formal agent completes rigorous proofs in Lean. The frontier results utilized large-scale computational scaling, while practical regime scores better reflect standard deployment capabilities.