DeepSeek Prover-V2: The secret weapon that makes Olympiad math easy!

A few days ago, rumors were circulating everywhere about the imminent release of DeepSeek-R2. DeepSeek indeed has new developments, but instead of R2, what everyone got was DeepSeek-Prover-V2, which is, of course, open source.

Prover-V2 has achieved the best performance in the theorem proving domain, reaching a pass rate of 88.9% in the MiniF2F test, and also scoring well on AIME 24 and 25. On the evening of April 30, some technical details of DeepSeek-Prover-V2 were updated on the machine learning collaboration platform HuggingFace.

This time, the DeepSeek team released two versions of the DeepSeek-Prover-V2 model, with parameter sizes of 7B and 671B respectively. Among them, DeepSeek-Prover-V2-671B is trained based on DeepSeek-V3-Base, while DeepSeek-Prover-V2-7B is built on DeepSeek-Prover-V1.5-Base and supports a context length extension of up to 32K tokens.

DeepSeek-Prover-V2-7B link: DeepSeek-Prover-V2-7B
DeepSeek-Prover-V2-671B link: DeepSeek-Prover-V2-671B

To summarize what DeepSeek-Prover-V2 is in one sentence: It is an open-source large language model specifically designed for the "mathematical AI programming language" Lean 4, focusing on formal theorem proving.

Its initialization data is collected through a recursive theorem proving process driven by DeepSeek-V3. In the cold start training phase, DeepSeek-V3 first decomposes complex problems into a series of solvable sub-goals through prompts. Each time a sub-goal is solved, these proofs are integrated into a "chain of thought." This process also incorporates the step-by-step reasoning trajectory of DeepSeek-V3, collectively constructing the initial training data for reinforcement learning.

The brilliance of this strategy lies in its ability to integrate informal and formal mathematical reasoning into a unified model, allowing the model to think flexibly like a human while also reasoning rigorously like a machine, truly achieving the integrated fusion of mathematical reasoning.

Technical Overview#

Generating Cold Start Inference Data through Recursive Proof Search#

To build the cold start dataset, the DeepSeek team designed a concise and efficient recursive theorem proving process, using DeepSeek-V3 as a unified tool, responsible for both the decomposition of sub-goals and the formal expression of reasoning steps. The specific process involves guiding DeepSeek-V3 through prompts to decompose the theorem into high-level proof sketches while simultaneously formalizing these reasoning steps in Lean 4 language, ultimately generating a series of clearly structured and logically rigorous sub-goals.

Overview of the cold start data collection process used by DeepSeek-Prover-V2. Reducing computational overhead has always been a strength of the DeepSeek team, and this time is no exception. They use a smaller 7B model to complete the proof search for each sub-goal, thereby reducing the computational burden. Once all steps of the complex problem decomposition are successfully solved, they correspond the complete formal step-by-step proof with the chain of thought generated by DeepSeek-V3, combining them into cold start inference data.

Reinforcement Learning Based on Synthetic Cold Start Data#

The DeepSeek team selected a portion of challenging theorem problems. Although the 7B proof model cannot solve them end-to-end, it can effectively handle a series of decomposed sub-goals. Integrating the proofs of all sub-goals can construct a complete formal proof of the original problem. Subsequently, this formal proof is attached to the chain of thought generated by DeepSeek-V3, which demonstrates the corresponding lemma decomposition process, thus forming a training dataset that closely integrates informal reasoning with subsequent formal processes.

After fine-tuning the proof model with synthetic cold start data, the research team further introduced a reinforcement learning phase to enhance the model's ability to convert informal reasoning into formal proofs. During the training process, following the general objectives of the reasoning model, binary feedback of "correct/incorrect" is used as the primary reward signal.

The resulting model, DeepSeek-Prover-V2-671B, achieved state-of-the-art performance in neural theorem proving tasks, reaching a pass rate of 88.9% on the MiniF2F test and successfully solving 49 out of 658 problems in the PutnamBench dataset. All proofs generated by DeepSeek-Prover-V2 on the miniF2F dataset have been compiled into a ZIP file and are available for download.

Download link: miniF2F solutions