When, and When Not to RAG

From OpenAI, to Cohere, and Anthropic, and big giants like Microsoft Azure, AWS, and IBM Watsonx.ai, and the open source LangChain, everyone loves to RAG. So, what’s the deal with RAG, and why is it gaining popularity so fast within enterprise?

RAG, or Retrieval-Augmented Generation, burst onto the scene in 2020 when the brainiacs over at Meta AI decided to jazz up the world of LLMs. It’s a game-changer. Designed to give LLMs much-needed information techniques, RAG swooped in to fix the problems that haunted its predecessors – the dreaded hallucinations.

LLMs rely on statistical patterns without true comprehension. They’re excellent at generating text but struggle with logical reasoning, resulting in hallucinations.

That is because LLMs, no matter how big the size of the model is, and how long the context length is, it is still fixed to that information that it was fed during training. With RAG, customers can add another data set, and give the LLM fresh information to generate the answer from. This is what enterprises need, to generate insights from their own data.

The safety issues

With the launch of GPT-4 Turbo and the Retrieval API, OpenAI has tried to fix the hallucination problem. With the long context length and the option for enterprises to integrate new data for information, OpenAI has almost cracked and solved the most important problem of LLMs, but forgot the data privacy of users.

For example, with a little fancier prompt engineering, a user on X was able to download the original knowledge files from someone else’s GPTs, an app built with the recently released GPT Builder, exactly with RAG. This is a big security issue for this model.

Oh man — you can just download the knowledge files (RAG) from GPTs. I don't know if this is a security leak or "just" a prompt engineering? @OpenAI @simonw https://t.co/VKMW8s4vfb pic.twitter.com/S1RYREna9b
— Kanat Bekt (@kanateven) November 9, 2023

If you give access to your documents to the AI model, someone can “convince” it to let them download the original files. Interestingly, Sam Altman at DevDay made no such announcement about this. Though the release blog conveniently says, “As with the rest of the platform, data and files passed to the OpenAI API are never used to train our models and developers can delete the data when they see fit.”

It seems as if the announcement of GPT Builder was just one more step for OpenAI to collect more data from the users, as long as they don’t delete it. Now that the company is also training GPT-5, it might make use of the files people upload and train on it. If it is just a bug, OpenAI should fix it immediately and make the original file inaccessible to the end user.

Similarly, Google Bard also faced a similar prompt injection problem, where a hacker was able to exfiltrate files such as Docs, Drive, and YouTube history, from the chatbot that other users have uploaded. Even Google’s Bard is not foolproof.

Users on Reddit discuss if LangChain’s RAG offering would be better than using OpenAI’s. Currently, GPT Builder has a 20-file limit on its platform for building a single GPT, which makes it less desirable for serious developers.

Everyone RAG differently

If you can ignore these security flaws with GPTs, it is still a viable user. But all of this should happen in a single prompt which should ask the question and also ask the LLM to retrieve information from the specific dataset. And each company is focusing on solving a specific problem at the moment.

For dynamic knowledge control, RAG lets you tweak and expand its internal knowledge without the hassle of retraining the entire model. This is mostly provided by open source companies such as LangChain, by integrating them with a vector database such as Pinecone, and using it with any open source LLM.

Every LLM builder does this by trying to expand the size of the model, or in the case of Anthropic, Bard, or Cohere, they let the user get the answers from the internet. This also allows them to generate current and reliable information for not relying on outdated facts. RAG ensures the LLM always has the latest and most trustworthy information at its tips.

For ensuring domain specific knowledge, Cohere and Anthropic let’s enterprise provide their own personal data through Oracle Cloud for expanding on the internal data. These LLMs with RAG, provide insights that are more personalised company’s data.

This in the end definitely brings to question the announcement of the retrieval API by OpenAI. Though the price of it is decreased, other alternatives along with open source ones, make OpenAI’s closed door one unscalable. Though OpenAI is trying to introduce Long-Context RAG with an increased number of tokens, in the hope that users wouldn’t want internet connection.

RAG stands out for its unique blend of benefits and cost-effectiveness. Its advantages include dynamic knowledge control, access to current and reliable information, transparent source verification, effective information leakage mitigation, domain-specific expertise, and low maintenance costs, among others. Choose your pick wisely.

The post When, and When Not to RAG appeared first on Analytics India Magazine.