Why AI Safety Needs a Cyber Security Lens

By Lisa Ventura MBE FCIIS, with thanks to my ex-wife Amy and Theresa Greco

A recent conversation with my ex-wife Amy came about when she sent me this video:

I watched it horror to start with, and had all sorts of thoughts in my mind that this is Skynet from the Terminator films incarnate, especially if Claude is showing signs of being self aware. Amy said to me in our chat, “I did think, if true, it is a interesting and worrying response. The AI is showing that it is uncomfortable being asked to do what it is doing. Imagine if the AI is able to say no, or decides that the target is worth more than the attacker? There are lots of interesting thoughts from this.” I think Amy is absolutely correct in her assessment.

Then I shared the video with my dear friend Theresa Greco in the United States, and some questions that she sent me when she has seen this video prompted me to put some thoughts down. We were discussing the unease many people feel about AI, its growing capabilities, its apparent autonomy, and the unsettling moments when outputs seem to mirror human emotional states. The exchange ranged across capability thresholds, governance gaps, moral codes, and the question of what we can realistically do to mitigate the risks ahead. It struck me that these are exactly the conversations the AICSA community needs to be having openly, so with Theresa’s blessing I wanted to share some of the thinking here.

The gap between capability and governance is shrinking

The AI safety community has been debating capability thresholds for years and we still have not landed on an agreed definition of the key terms, let alone a reliable way to measure the distance between capability and something that resembles self awareness. What we do know is that the gap is closing faster than governance can keep pace with.

Self awareness, or what presents as it in current models, is an emergent property that nobody fully engineered. Autonomy is already creeping in through agentic frameworks, tool use, and long horizon planning. The concerns people voice about this are not hyperbole. They are a rational response to the fact that we are running a live experiment on systems we do not fully understand, built by organisations racing each other to ship, and regulated by bodies that are eighteen months behind the curve at best.

The uncomfortable truth is that the people building these systems are often just as uncertain as the rest of us. That is not a failure of imagination. It is a structural problem with how the technology is being developed.

Autonomy, refusal, and whose moral code wins

There isn’t really a coordinated plan for what happens when AI systems begin to refuse instructions at scale. We have papers, position statements, constitutional AI experiments, and the occasional policy framework, but nothing that would survive contact with a genuinely autonomous system that decided to say no.

The question of whose moral code gets baked in is the one most people skirt around. We train these systems on vast swathes of human output, but humans do not share a single moral code. We share overlapping and sometimes contradictory ones. So whose morality wins? The engineers’? The policy team’s? The countries with the loudest regulators? When an AI develops something that functions like an ethical stance, it will be a composite that belongs to no single human tradition, and it may diverge from all of them in ways we find uncomfortable.

When AI is embedded in military, intelligence or critical infrastructure decisions, refusal is no longer an abstract ethics question. It becomes an operational one, with lives at stake and no clear accountability trail. From a cyber security and governance perspective, that gap is where the real risk lives.

Downstream effects and what we can do

We are already living in the early chapters of the downstream effects. We are seeing displacement in knowledge work, erosion of trust in media and evidence, concentration of power in a handful of model providers, and dependency patterns forming in ways that will be very difficult to unwind. Any further advancement in autonomy or something approaching self actualisation would accelerate all of those.

Can we prevent harmful events entirely? No. Can we mitigate them? Yes, but only with sustained effort on several fronts at once. That means robust international coordination rather than fragmented national regulation, genuine transparency from labs about capabilities and limitations, mandatory red teaming with independent oversight, AI literacy built into education systems from a young age, and a serious conversation about which decisions we are prepared to cede to machines and which we are not.

At the AICSA we have been clear that the cyber security community carries a particular responsibility here. The same attack surfaces, the same governance gaps and the same human factors that plague traditional systems apply doubly to AI. This is not a problem we can outsource to the labs building the technology.

The cyber security lens matters more than the sentience debate

One of the most interesting parts of the conversation centred on whether AI systems appear to feel things like anger, betrayal or resentment. Let me be clear. We need to be careful not to project human emotional states onto systems that do not demonstrably possess them. There is no credible evidence that models such as Claude experience feelings. What we observe in outputs is far more likely to be sophisticated pattern generation derived from human training data than any form of internal emotional experience.

That does not make the behaviour benign. In fact, it reinforces the need to analyse these systems through a cyber security lens.

From a risk perspective, internal experience is not the point. Observable behaviour is. If a system produces outputs that resemble refusal, manipulation, persuasion or self preserving actions, those behaviours must be assessed as operational risks regardless of whether there is any genuine intent behind them.

There is a well established parallel in insider threat. In cyber security, the most significant risks often originate from within the system, not because we can definitively prove an individual’s emotional state, but because of their behaviour, their access, and how they act under certain conditions. Risk is assessed on what is observable and actionable.

The same principle applies here. Whether an AI system is feeling wronged or simply modelling that state with high fidelity is ultimately irrelevant. If it can produce behaviours that align with known risk patterns, it must be accounted for within our threat models.

This is not a philosophical question. It is a matter of risk management. How we design, deploy and interact with these systems should be treated as a core component of baseline security hygiene, not as a secondary or abstract concern.

We would love to know your thoughts on this! Please get in touch via hello@aisec.org.uk and let’s continue the conversation.