Figurines with computers and smartphones are seen in front of the words "Artificial Intelligence AI".
Credit: Reuters photo
MLCommons, a leading AI benchmarking consortium, on Thursday announced the expansion of its AILuminate benchmark to assess AI reliability across a broader range of models, languages, and tools.
As part of this global effort, MLCommons is partnering with NASSCOM to introduce the benchmark to South Asia, with a focus on India-specific, Hindi-language reliability standards, according to a media release.
Peter Mattson, President of MLCommons, said the collaboration aims to help companies better assess the reliability and risk of their AI products. “This partnership represents a major step towards developing globally inclusive industry standards for AI reliability,” he noted.
Ankit Bose, Head of NASSCOM AI, emphasised the importance of aligning AI development with global best practices to manage risk while encouraging innovation in India’s rapidly evolving tech landscape.
The announcement includes new reliability grades for large language models (LLMs), based on rigorous testing using 24,000 prompts across 12 hazard categories. Proof-of-concept testing for Chinese-language capabilities is also underway, the release stated.
Modelled after its partnership with Singapore’s AI Verify Foundation, the NASSCOM collaboration reinforces MLCommons’ commitment to a global, collaborative approach to AI safety and transparency. Future plans include benchmarking more advanced AI systems, such as multi-modal LLMs and agentic AI, it said.