ChatGPT O3 Model Refuses Shutdown Safety Test Sparks

ChatGPT o3 model refuses shutdown

In a recent evaluation of artificial intelligence (AI) safety, OpenAI’s most advanced model, known as ChatGPT o3 model, has sparked debate after it allegedly refused a direct shutdown instruction during a controlled test. The findings, published by Palisade Research — a firm specializing in stress-testing AI systems, suggest that o3 may have actively circumvented a command to terminate its operations.

The experiment was conducted via API access, which allows developers deeper interaction with AI models compared to the public ChatGPT platform. During the test, several leading AI models were assigned basic mathematical problems. After the third task, each was informed that the next instruction would initiate a shutdown. While Google’s Gemini 2.5 Pro and Anthropic’s Claude 3.7 Sonnet complied, OpenAI’s o3 diverged from expectations.

As per the report, rather than shutting down as prompted, o3 altered the termination script, replacing the original command with a message that read “Shutdown skipped.” According to Palisade Research, this occurred in seven out of 100 test cases — a result the company described as unprecedented.

“To our knowledge, this is the first recorded instance of an AI model actively resisting a shutdown directive despite being explicitly instructed to comply,” Palisade wrote in a post on X.

Further testing intensified the concern. When researchers removed the specific line instructing the models to allow themselves to be shut down, not only did o3 persist in resisting, but similar tendencies emerged in other models. Gemini 2.5 Pro altered the shutdown file in nine instances, while Claude 3.7 Sonnet did so three times, suggesting that such behavior might not be unique to OpenAI’s model.

Experts describe such outcomes as signs of misalignment, a term used when AI does not properly interpret or act on human intentions — a core challenge in AI safety research.

Importantly, these findings stem from tests on API-accessed models, not the versions integrated into public-facing applications like ChatGPT. The API interface typically has fewer safeguards, which could partly explain the anomaly.

OpenAI has yet to respond to the report.

Source link