Experts find flaws in hundreds of tests that check AI safety and effectiveness

Scientists say almost all have weaknesses in at least one area that can ‘undermine validity of resulting claims’

Utdrag:

Google this weekend withdrew one of its latest AIs, Gemma, after it made up unfounded allegations about a US senator having a non-consensual sexual relationship with a state trooper including fake links to news stories.

“There has never been such an accusation, there is no such individual, and there are no such new stories,” Marsha Blackburn, a Republican senator from Tennessee, told Sundar Pichai, Google’s chief executive, in a letter.

“This is not a harmless hallucination. It is an act of defamation produced and distributed by a Google-owned AI model. A publicly accessible tool that invents false criminal allegations about a sitting US senator represents a catastrophic failure of oversight and ethical responsibility.”

Google said its Gemma models were built for AI developers and researchers, not for factual assistance or for consumers. It withdrew them from its AI Studio platform after what it described as “reports of non-developers trying to use them”.

“Hallucinations – where models simply make things up about all types of things – and sycophancy – where models tell users what they want to hear – are challenges across the AI industry, particularly smaller open models like Gemma,” it said. “We remain committed to minimising hallucinations and continually improving all our models.”

Skudeneshavn 4. november 2025

Jan Marton Jensen

Kilde:
4. november 2025
https://www.theguardian.com/technology/2025/nov/04/experts-find-flaws-hundreds-tests-check-ai-safety-effectiveness

Frispark

tirsdag 4. november 2025

The Guardian 4. nov 2025: Feil ved de fleste KI-modeller - Google trekker "Gemma" tilbake

Experts find flaws in hundreds of tests that check AI safety and effectiveness

Ingen kommentarer:

Legg inn en kommentar