Google Unveils RecurrentGemma, Moves Away From Transformer Based Models
At Google Cloud Next ’24, Google unveiled a new model RecurrentGemma 2B, a family of open-weights Language Models by Google DeepMind, based on the novel Griffin architecture.
This architecture achieves fast inference when generating long sequences by replacing global attention with a mixture of local attention and linear recurrences.
Google released a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. RecurrentGemma-2B is pre-trained on 2T tokens in contrast, Gemma-2B was pre- trained on 3T tokens.
One of RecurrentGemma’s key strengths lies in its reduced memory footprint. This feature is particularly valuable for generating longer samples on devices with constrained memory capacities, including single GPUs and CPUs.
By optimising memory usage, RecurrentGemma empowers users to tackle more complex tasks without encountering memory bottlenecks. The efficiency gains of RecurrentGemma extend to its throughput capabilities. Thanks to its lower memory demands, this model excels at performing inference tasks with larger batch sizes.
This translates into a significant increase in token generation per second, especially when dealing with lengthy sequences. Such enhanced throughput is a boon for tasks requiring rapid and continuous data processing.
Google also released JAX code to evaluate and fine-tune RecurrentGemma, including a specialized Pallas kernel to perform linear recurrence on TPUs. Additionally, the company provided a reference PyTorch implementation.
The post Google Unveils RecurrentGemma, Moves Away From Transformer Based Models appeared first on Analytics India Magazine.


) (@JeffDean) 


