Open Source LLMs Pave the Way for Responsible AI in India
Open-source large language models are emerging as powerful tools in India’s quest for responsible AI. By allowing developers to fine-tune models on locally relevant datasets, organisations are building solutions that reflect the country’s diversity.
In a recent conversation with AIM, powered by Meta, Alpan Raval, chief AI/ML scientist at Wadhwani AI, and Saurabh Banerjee, CTO and co-founder of United We Care, explained how this approach is making AI both more ethical and more effective.
“We are doing projects in healthcare, in agriculture, and in primary education that leverage LLMs, some of which are supported by Meta,” said Raval.
He further added that open source models offer a lot of freedom in terms of fine-tuning them, adding extra layers on top of them, and then retraining from scratch.
Alpan shared another example where they have developed an oral reading fluency assessment using AI, currently deployed in public schools across Gujarat, India. This initiative leveraged AI4Bharat’s open-source models.
Raval stated that they collected student data from across the state and trained more advanced models by utilising both this student data and synthetic data generated through pseudo-labelling children’s voices with base models. He emphasised that this achievement would not have been feasible without the open-sourcing of the base models.
Adding on to the conversation, Banerjee said that if any company is going for a vertical use case, the best approach would be to pick an open-source model and do the post-training on that. “We should focus on post-training on the existing pre-trained models, and work with the use case,” he said.
Tackling Bias
Alpan said that open source, by itself, magically removes bias. “It depends on the methodology, the kind of data the model was trained on, and so on,” he said.
He explained that many open-source models are trained on datasets that differ significantly from the data observed in rural and underserved communities. “It’s almost imperative for us in order to prevent bias that we have to fine-tune those data sets.”
Discussing hallucinations, Banerjee said that LLMs won’t stop hallucinating, and we have to live with that. However, he believes it is sensible to put weights and biases, training methodology, in the public domain. He explained that this transparency allows for public scrutiny and helps identify inherent errors.
“Put it in the public domain for public scrutiny. Let people decide what they are getting into, rather than a closed, boxed approach.”
He also offered a nuanced perspective on bias, suggesting that it’s not always inherently negative. He provided examples of common AI limitations, such as generating an image of an analogue clock at 6:25 or a left-handed person writing.
Banerjee explained that these limitations stem from training data being biased towards certain representations. To improve model accuracy, he said it may be necessary to introduce a different kind of bias, which he calls positive bias. He gave the example of healthcare, where accuracy matters more than being completely neutral. In such cases, adding a positive bias can help make the system more accurate, even if it means making a trade-off.
Security and AI Guardrails
For organisations in the social sector, the security of Personally Identifiable Information (PII) remains a top concern. Alpan said, “We have a rule—more or less—that we don’t ingest PII into the organisation at all, except in certain cases where we have no choice.”
Regarding ethical guardrails and governance, Alpan said that there’s no “one size fits all” solution. The ethical use of open-source models depends on their intended application. On the other hand, Banerjee said there is a need for an “inter-governmental initiative” for AI safety, similar to aviation safety, due to the decentralised nature of AI processing and training.
He added that clear guidelines on “what is acceptable in a domain and what is not” are needed, particularly in human-machine interaction.
Banerjee said that instead of looking at the West, India should be proud of the work that it is doing for responsible AI and lauded NASSCOM’s developer guidelines.
He stated that the developer guideline is highly actionable and serves as a resource for both individuals and organisations to comprehend their responsibilities when using, building, or fine-tuning foundation models.
Alpan said that India’s leadership in using AI for social good is supported by strong government collaboration. “India has been the number one country in the world to emphasise AI for social good—and it’s not just in letter but also in spirit,” he added.
He further said that open source AI is being used to solve pressing challenges in fields ranging from healthcare and agriculture to education and climate. “Nandan Nilekani has said many times that India is going to be the use case capital of the world, and that applies to AI as well,” he concluded.
The post Open Source LLMs Pave the Way for Responsible AI in India appeared first on Analytics India Magazine.




