Qwen-2.5-is-Winning-the-AI-Agents-Race

Chinese giant Alibaba has released the Qwen3 family of open-weight AI models. Apart from its flagship 235B (billion) parameter model, Qwen3 is released in various sizes. 

It includes variants consisting of 0.6B, 1.7B, 4B, 8B, 14B, 32B, parameters, alongside a 235B model with 22B activated parameters, and a 30B model with 3 billion activated parameters. 

The model can be deployed locally using tools such as Ollama and LM Studio, and can also be accessed through a web browser using Qwen Chat. 

Users can switch between a “thinking” mode for tasks that require reasoning and a “non-thinking” mode for tasks that demand quick responses. 

Qwen3’s 235B parameter model outperforms OpenAI’s o1 and o3-mini (medium) reasoning models on benchmarks that evaluate its abilities in mathematical and programming tasks. Besides, it also offers performance parity with Google’s Gemini 2.5 Pro models on several benchmarks. 

Having said that, the model lags behind OpenAI’s newly released o4-mini (high) model. In the LiveCodeBench coding benchmark, Qwen3’s 235B scores 70.7%, whereas the o4-mini (high) scored 80%. 

On the AIME 2024 math benchmark, OpenAI’s o4-mini (high) scored 94%—only slightly higher than Qwen3’s 235B (85.7%). Benchmark scores of other newly released models can be found on Artificial Analysis

Moreover, other variants of the Qwen-3 models outperformed their predecessors, and the 30B parameter variant outperformed DeepSeek-V3 and OpenAI’s GPT-4o on benchmarks. 

Simon Willison, co-creator of the Django Web Framework, said in a blog post, “The thing that stands out most to me about the Qwen3 release is how well coordinated it was across the LLM ecosystem.” 

While Wilson was using the models, he noted that they “worked directly” with all popular LLM serving frameworks from the day of their release. 

“This is an extraordinary level of coordination for a model release! I haven’t seen any other model providers make this level of effort—the usual pattern is to dump a bunch of models on Hugging Face for a single architecture (usually NVIDIA) and then wait for the community to catch up with quantisations and conversions for everything else,” he added. 

Besides, given the spectrum of sizes in which the model has been released, Wilson said, “0.6B and 1.7B should run fine on an iPhone, and 32B will fit on my 64GB Mac with room to spare for other applications.” 

The Qwen3 family of models is a successor to the Qwen2.5 models. Last month, the company announced the QwQ 32 billion parameter model, which was said to achieve comparable performance with DeepSeek-R1, despite being a much smaller model. 

Furthermore, the company also launched the QwQ-Max-Preview model last month, built on the Qwen2.5 Max. The model was said to specialise in mathematics and coding-based tasks. 

The post Alibaba’s Qwen3 Outperforms OpenAI’s o1 and o3-mini, on Par With Gemini 2.5 Pro appeared first on Analytics India Magazine.