OpenAI has decided to cut testing of its newest, most powerful AI models from months to only a few days and will start rolling out one of them, called “o3,” sometime this week.

Testing assesses LLMs’ vulnerabilities to do biased or illegal things and trains them to be more resilient to the lures of violence or crime (a process called “fine-tuning”).

GPT-4 was tested for six months before its launch in 2023.

The company is unconcerned, citing the lack of agreed safety standards and that components and earlier versions of its supercharged models were tested at certain checkpoints, so it was “confident that its methods were the best it could do” (according to the Financial Times).

In other words, its new policy is “What, Me Worry?”

Testing was never the right or complete answer to the question of AI risk, since it was always a snapshot of a particular condition and moment. Systems evolve, especially those that are designed to do so, and the predictability of their performance goes down in lockstep with the size and diversity of their actions over time.

Promises that an AI wouldn’t do something bad in the future were no more dependable than claims that it won’t rain a year from now, or that an honors high schooler won’t cheat on her taxes someday.

We tolerate these risks because we’ve also been promised that AI will do great things, with all of the conversations packed with technogibberish so we can’t question them.

The risk of an AI doing something awful has always been, well, awful.

Worse, what happens if OpenAI’s o3 and whatever comes next from them and their competitors don’t break any laws or insult anybody? What if they do exactly what they’re supposed to do?

These models have been purposefully designed to do more, and do it faster and more often, and thereby ingratiate their decision making — both in response to our queries and, with evermore regularity, anticipate and guide our questions and subsequent actions — into every aspect of our work and personal lives.

How could anyone test for those consequences? We were already participants in the biggest open-ended experiment in human history, and the boffins conducting it have never offered to take responsibility for its outcomes.

What, me worry?

All the time.

[This essay appeared originally at Spiritual Telegraph]

AI Testing = “What, Me Worry?”

Published by Jonathan Salem Baskin on April 14, 2025

No, AI Isn’t Just Another New Technology

CliffsNotes for the AI Apocalypse

Looking For The Wrong Thing From AI?

AI Testing = “What, Me Worry?”

Published by Jonathan Salem Baskin on April 14, 2025

Related Posts

No, AI Isn’t Just Another New Technology

CliffsNotes for the AI Apocalypse

Looking For The Wrong Thing From AI?