JPMorgan Scientist Unveils Phixtral, Mixture of Mistral with Phi-2
Mistral just released the paper of their Mixtral of Experts model and there are new models already coming in. Maxime Labonne, Sr. Machine Learning Scientist at JPMorgan, has introduced Phixtral, a novel Mixture of Experts (MoE) model built with Microsoft Phi-2 models.
Click here to check out the model.
Labonne’s creation combines 2 to 4 fine-tuned models, each containing 2.8 billion parameters, surpassing the performance of individual experts, drawing inspiration from Mistral AI’s Mixtral architecture while developing Phixtral.
Phixtral can run with 4-bit precision on a free T4 GPU.
Phixtral is presented in two variations: phixtral-2x2_8 and phixtral-4x2_8. The former represents the first MoE made with two microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture, and outperforms individual experts.
Meanwhile, the latter, phixtral-4x2_8, stands out as the inaugural MoE incorporating four microsoft/phi-2 models, once again surpassing the capabilities of individual experts.
The model’s efficiency is underscored by its ability to outperform each individual expert, marking a notable advancement in MoE design.
On ‘Yet Another LLM Leaderboard’ (YALL), the model performed better than base phi-2 and just below Zephyr2-7B.
Phixtral’s underlying architecture, represented by models like dolphin-2_6-phi-2, phi-2-dpo, phi-2-sft-dpo-gpt4_en-ep1, and phi-2-coder, showcases the collaborative effort of various model authors. Labonne emphasises the significance of these models in the creation of Phixtral, highlighting their exceptional capabilities.
The post JPMorgan Scientist Unveils Phixtral, Mixture of Mistral with Phi-2 appeared first on Analytics India Magazine.



