Tonight, the AI community was shaken! Google launched the "strongest reasoning model" Gemini 2.5 Pro late at night! That's right, it's the large reasoning model from Google that I mentioned in my article yesterday! It defeated Claude-3.7-Thinking, and the leaked large model, codenamed "Nebula," was previously reported to perform exceptionally well, surpassing models like o1, o3-mini, and Claude 3.7 Thinking. I didn't expect the new model to be released so quickly; it was leaked on the 24th, and Google officially announced its launch on the 25th!
Gemini 2.5 Pro ranks first on the large model leaderboard LMSYS Arena, and it is a clear first! Its score is a full 40 points higher than Grok-3 and GPT-4.5! It's worth noting that previously, the top models on LMSYS had very close scores, just a few points apart. Grok had just announced breaking the 1400 score barrier, and now Gemini 2.5 Pro has directly achieved 1443 points, setting the record for the largest jump up.
First of all, Gemini 2.5 Pro (model version is gemini-2.5-pro-exp-03-25) is a reasoning model, and Google claims this is the most powerful model to date. Not only does it lead comprehensively, but it also has no weaknesses. It ranks first in all evaluation categories (overall ability, coding, mathematics, creative writing, etc.), especially excelling in complex prompts with style control (Hard Prompts w/ Style Control) and multi-turn dialogues (Multi-Turn).
Gemini 2.5 Pro is not only Google's largest reasoning model but also possesses multimodal capabilities, ranking first in the Vision Arena visual leaderboard. It ranks second in the web development leaderboard WebDev Arena, just behind Claude-3.7, whose programming position remains difficult to shake.
Now let's look at the specific scores on various benchmarks—Gemini 2.5 Pro achieved the best overall performance. It leads especially in science, code generation, visual reasoning (MMMU), and long text understanding (MRCR). In the so-called hardest test, "The Last Exam for Humanity," Gemini 2.5 Pro is far ahead of OpenAI's o3-mini. In the so-called hardest AI test, "The Last Exam for Humanity," Gemini 2.5 Pro is far ahead of other models.
SWE-bench represents coding ability, while Aider Polyglot represents code editing proficiency. After reviewing all the leaderboards, I can only say "terrifying!" Now, Gemini 2.5 Pro is already available for use in Google AI Studio and the Gemini APP. Portal: Google AI Studio
Next, let's look at the effects—
First: Mandelbrot Set Demonstration#
The Mandelbrot set is a collection of points that form fractals in the complex plane, and some say it is the most bizarre and magnificent geometric figure ever created by humanity, once referred to as "God's fingerprint." Let's take a look at the effects generated by Gemini 2.5 Pro.
Second: Web Mini Game#
Do you remember this familiar dinosaur running game? The black-and-white version in memory has turned into a colored version. The generation is quite impressive.
Gemini 2.5 Pro's biggest advantage is that it still possesses native multimodal capabilities and an ultra-long context length, currently supporting up to a 1M window, with 2M on the way. However, the API pricing has not yet been announced. DeepSeek V3-0324 has also just been released, and it has the most permissive MIT license. Will it be the closed-source giants consolidating their stronghold, or the open-source camp pushing for technological equality?