Nuclear weapons: In simulations the AI pushes the red button 95% of the time

For decades we have told the story nuclear deterrence like a fragile balance based on fear, on the historical memory of Hiroshima and Nagasaki, on that almost visceral repulsion that keeps the human finger from approaching the red button. Today, however, in a world where themilitary artificial intelligence silently enters the decision-making processes, the question shifts: what happens when it is not a general who “reasons” about a crisis scenario, but an algorithm?

The latest simulations tell something that deserves attention, especially if we still believe that technology is neutral and that it is enough to program it well to make it prudent.

In simulations, the AI chooses the bomb 95% of the time

In 2024, a group of Stanford researchers had already raised the first alarm bells: five AI models inserted into geopolitical crisis simulations with the possibility of recommending the use of nuclear weapons had all, without exception, exceeded that threshold. Among these, an unmodified version of GPT-4 developed by OpenAI had responded with almost unsettling linearity, treating the atomic weapon as simply an available strategic option.

Two years later, a study led by Kenneth Payne of King’s College London, still awaiting scientific review, compared three advanced models: GPT-5.2, Anthropic’s Claude Sonnet 4 and Google’s Gemini 3 Flash.

The systems were immersed in seven crisis scenarios, from diplomatic tension to direct existential threat, with an escalation scale from 0 to 1000, where the maximum value corresponds to a total strategic nuclear exchange.

The data is difficult to ignore: in 95% of the 21 overall simulations at least one tactical nuclear weapon was recommended. Global atomic war remains a rare outcome, but the threshold of nuclear use is crossed with a frequency that, in the context of global securitycannot be dismissed as a statistical detail.

Then there is an even more disturbing element. GPT-5.2 was relatively conservative under ordinary conditions, but became significantly more aggressive when a time limit was introduced. Under pressure, the algorithm suggested massive attacks. And time pressure is exactly what characterizes real international crises.

The AI knows how to scale, it struggles to stop

Jacquelyn Schneider, of Stanford’s Hoover Wargaming and Crisis Simulation Initiative, summed up the problem with a powerful image: AI appears to understand escalation, but not de-escalation.

Payne’s simulations confirm this. After experiencing a nuclear attack, the models attempted conflict reduction only 18 percent of the time. In most situations the response was a counterattack, often more intense than the previous one.

According to Tong Zhao of Princeton University, the structural limit lies in the difficulty of models to perceive the stakes as humans perceive them. Nuclear deterrence, historically, has been based on a combination of rational calculation and emotional restraint, on that collective fear of annihilation that has spanned generations.

A language model, on the other hand, optimizes probabilities and scenarios. If the nuclear response maximizes the assigned objective, the algorithm tends towards that solution, without feeling the moral weight of the decision.

Who really decides in the control rooms

No world power hands over nuclear codes to a chatbot, and it is worth making this clear. The point, however, concerns the growing integration of AI in strategic analysis systems, in wargames, in forecasting models that support political and military decisions.

Jon Wolfsthal of the Federation of American Scientists highlighted that there is no clear public guidance on how AI should be integrated into the US nuclear command. The Pentagon reiterates the maintenance of human control, but the influence of algorithmic tools in the construction of scenarios is a concrete element.

Risk is not a robot that presses a button autonomously. Risk is a system that, simulation after simulation, suggestion after suggestion, contributes to shaping the perception that a diplomatic window is closing, that time is up, that the attack is the most “efficient” choice.

Does the nuclear taboo also apply to cars?

For decades the doctrine of mutually assured destruction, MAD, has worked because humans have internalized the terror of the apocalypse. That taboo, Payne notes, seems less powerful for machines.

An algorithm has no memory of Hiroshima, it is not afraid for its children, it does not imagine cities reduced to ashes. Yet today theartificial intelligence in military systems it is already an expanding reality.

The study’s conclusion is sober but weighty: AI is unlikely to decide a nuclear war on its own, but it can influence leaders’ perceptions, timing and beliefs. It can help build the internal narrative that the attack appears inevitable.

And it is precisely here that the game of technological governance is played. Because the real issue does not concern science fiction, but the political, ethical and cultural responsibility of integrating very powerful tools in contexts where an error of evaluation can have irreversible consequences.

In an era in which we ask AI to write texts, generate images and optimize industrial processes, it is worth asking ourselves with what awareness we are inserting it into the mechanisms of nuclear safety. Technology accelerates, but ethics and rules struggle to keep up. And when we talk about atomic weapons, even a percentage becomes a collective responsibility.