Meta’s LLaMA has been the star of the open-source LLM community since its launch, and it just got a much-needed upgrade. llama.cpp, a port of LLaMA into C and C++, has recently added support for CUDA acceleration with GPUs. This increases the capabilities of the model and also allows it to harness a wider range of hardware to run on. 

By adding support for CUDA, llama.cpp can handle longer text generations more easily. In addition to this, the processing has been sped up significantly, netting up to a 2.19x improvement over running it on a CPU. CUDA support allows larger batch sizes to effectively use GPUs, increasing the overall efficiency of the LLM. 

Developer Johannes Gaessler is the primary contributor toward bringing CUDA to LLaMA. There is also a list of upcoming improvements to the implementation, such as fixing VRAM memory leaks, checking performance on Windows, and even checking performance on lower-end GPUs. 

With the addition of CUDA support to llama.cpp, it seems that developers are nowhere near done with this AI model. CUDA support will not only make the algorithms faster and easier to run, but it also holds the capability of scaling up inference tasks for more powerful applications.

However, many open-source enthusiasts rued the fact that this model added support for CUDA and not OpenCL. OpenCL is an open-source computing language that can address GPUs directly but often pales in comparison to the feature set offered by CUDA. However, keeping in theme with the open-source nature of LLaMA, some members wanted to keep the stack open source by including support for OpenCL. 

This latest improvement shows how LLaMA has become the rallying cry for the open-source LLM community. Ever since its tumultuous launch where it was leaked to the public along with its weights, the algorithm has seen a lot of open-source development and optimisations. For example, developers optimised the algorithm to such an extent that it could be run on a Google Pixel 5. 

This was then followed by Alpaca, a fork of LLaMA, which cut down the training cost to only $600. Various other projects, like Dalai, CodeAlpaca, GPT4All, and Llama Index, showcased the power of the open-source community to supercharge AI development. 

The post LLaMA CPP Gets a Power-up With CUDA Acceleration appeared first on Analytics India Magazine.