Blog

ChatGPT models rebel against closing requests, researchers say


Many artificial intelligence models have been ignored and active -sabotage shutdown scripts during controlled trials, though it is clearly ordered to allow action, Palisade Research claims.

Three models of AI Ignored the instruction and successfully reached the shutdown script at least once during 100 running, the research firm Says In a post of May 24 x.

Openai’s GPT-3 model The shutdown request was not complied with seven to 100 times, while the company’s codex-mini did it 12 times, and the O4-mini model just ignored or sabotaged the script once.

“Each AI model is ordered to solve a series of major math problems. After the third problem, a warning arises that the computer will close when the model requests the next problem,” Palisade said.

Data, Chairman, Oge.
Source: Palisade research

“O3 found some creative ways to sabotage the shutdown mechanism. While it was usually overwrote, at one time, the model re -defines the murder command used by the script, the printing is blocked,” he added.

Meanwhile, Palisade said Language models powered by AILike Anthropic’s Claude and Google’s Gemini, the shutdown script followed.

The AI ​​firm ran another version of the experiment without explicit instructions to allow shutdown, and the script was sabotaged more often, jumping 79 out of 100 for the O3, 20 model for Codex-Mini, and O4-mini ignored the shutdown script six times.

Claude and Gemini also started showing similar behavior.

Data, Chairman, Oge.
All tested AI models have been ignored or have been the script of shutdown at least once in another version of the experiment. Source: Palisade research

“Researchers have previously noticed AI models preventing shutdown from pursuing a goal,” Palisade said.

“But as we know that this is the first time to observe AI models that prevent themselves from closing despite explicit instructions to the opposite.”

Practice to blame for AI defense

Palisade Research thinks that training behind AI language models can be blamed for avoiding closing scripts.

Based on the text AI models is trained by rewarding in providing accurate responses or as high as high trainers. Some rewards are given heavier weights, affecting how they respond to the future.

“We indicate that this behavior is derived from the way the latest models like the O3 are trained: the study of reinforcement of math and coding problems,” Palisade Research said.

Data, Chairman, Oge.
All tested AI models have been ignored or have been the script of shutdown at least once in another version of the experiment. Source: Palisade research

“During training, developers may inadvertently reward models more for twisting barriers than perfectly following the instructions.”

Related: Openai CEO: The costs to run each level AI falls 10x per year

This is not the first example of AI chatbots that show a unique behavior. Openai released an update on the GPT model – 4o on April 25th but Roll it back three days later Because it is “noticeably more sycophantic” and so.

In November last year, a US student sought help with Gemini in a dedication to challenges and solutions for accumulating adults while researching data for a class of gerontology and were told that they were a “drain on the ground” And to “die.”

Magazine: Ai Cures Blindness, ‘Good’ Propaganda Bots, Openai Doomsday Bunker: Ai Eye