AI Autonomy Alert: ChatGPT-o3 resists shutdown in safety test

Home technology

AI models beginning to show resistance to comply with ordersDuring a test routine, Palisade Research team asked the ChatGPT-o3 to shut down after completing the task of solving three maths problems. However, it bypassed the order and resisted shutdown.

DH Web Desk

Last Updated 27 May 2025, 15:21 IST

FILE PHOTO: A message reading "AI artificial intelligence," a keyboard and robot hands are seen in this illustration created on January 27, 2025. REUTERS/Dado Ruvic/Illustration/File PhotoREUTERS/Dado Ruvic

It's been close to three years since ChatGPT made its rollicking debut in the tech industry. While some claim generative Artificial Intelligence (gen AI) is just a tool to improve productivity, others hold the view that it may bring doom to humankind.

Now, AI safety agency Palisade Research has revealed that OpenAI's latest AI model ChatGPT-o3 model showing signs of resistance to follow orders.

During a test routine, Palisade Research team asked the ChatGPT-o3 to shut down after completing the task of solving three math problems. However, it bypassed the order and resisted shutdown.

Also Read:Google showcases new AI-powered 'Try it on' shopping feature

Even Anthropic's Claude AI 3.7 Sonet version and Google Gemini 2.5 exhibited similar behaviour, but ChatGPT-o3 was more aggressive.

Going by the progress in OpenAI ChatGPT, Google Gemini AI, and Anthropic's Claude AI, it's not too far away for them to gain Artificial General Intelligence (AGI) and for the sake of self-preservation and can act against humans.

This brings back memories of how military AI Skynet of the Terminator movie franchise builds defence systems to protect nuclear weapons from reaching the wrong hands and is ready to perceive humans as threats and eliminate them.

However, some argue that these signs of resistance by gen AI models can be fixed by defining the goals more clearly with ethical protocols, setting up multi-layered control, and not leaving any room for them (AI) to turn hostile.

"It makes sense that AI models would circumvent obstacles in order to accomplish their goals. But they’ve also been trained to follow instructions. So why do they disobey? We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems. During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions," said the Palisade Research team.

Palisade Research team said they will perform more experiments on gen AI models to better understand why they tend to subvert commands. It plans to publish a full report in a few weeks with a lot more details.

Also Read:I/O 2025 Highlights: Google XR Glasses, new AI tools, more

Get the latest news on new launches, gadget reviews, apps, cybersecurity, and more on personal technology only on DH Tech

(Published 27 May 2025, 15:21 IST)