Why everything you've ever heard about AI risk is wrong
And why this is OK — as long as we dial down egos, ramp up humility, and collaborate more effectively across disciplines and areas of expertise
There was a meme doing the rounds recently that poked fun at the seeming overnight emergence of AI expertise. The image showed a cheetah as fast, a plane as faster, the speed of light as super-fast, and all of these being outstripped by the speed with which people are becoming experts in AI.
It’s a snarkily accurate meme that reflects the frustration of experts who have studied AI for years. Yet ironically, these same experts are in danger falling into a similar trap when it comes to AI risk — so much so that I thought it time the meme was updated:
Over the past year, there’s been an explosion of commentary, speculation, positioning, and professional posturing, around AI risk. And pretty much all of it is wrong — not because these people don’t know their stuff, but because the challenges being faced here are so novel, so complex, and so cross-cutting, that they transcend any one individual’s or discipline’s ability to address them alone.
This is very much uncharted risk territory, irrespective of whether it’s framed in terms of the risks of going too fast, the risks of not going fast enough, or the risks of simply assuming there are no risks.
I’ve been working in the field of risk for over thirty years and have been at the forefront of grappling with novel risks from emerging technologies for the past twenty of these. And the more I study artificial intelligence, the less certain I am that we even know how to formulate the problems we face around AI, never mind manage the risks that may emerge from it.
Part of the challenge is that current developments in artificial intelligence are posing challenges that we simply don’t have the theories, models, mindsets, approaches, or policies, to navigate with clarity. This is very much uncharted risk territory, irrespective of whether it’s framed in terms of the risks of going too fast, the risks of not going fast enough, or the risks of simply assuming there are no risks.
The result is an understanding-vacuum that is proving exceptionally easy for people to fill with their own ideas, ideology, or speculations, irrespective of their validity — buoyed along by the often highly asymmetric reach of social media.
Many of these actually provide useful insights into parts of the AI risk challenge. But few if any are trustworthy on their own. And at their worst, they risk obscuring the complexity of the challenges we collectively face with their dogmatic overconfidence.
If this is the case though, how should we be approaching AI risk?
Unpacking risk
As a first step to approaching the complex and rather messy landscape around AI risks and benefits, it’s worth going back to first principles and what we mean by risk in the first place.
While there’s no universal definition of risk (and this in itself creates a challenge), risk is typically understood as the probability of harm occurring from an action, process, or situation.
On the surface this is a pretty straight forward definition. But it’s one that needs some unpacking to understand its nuances.
Cause and effect
To start with, this definition of risk is built on principles of cause and effect. For risk to occur, there needs to be a cause — an action, a process, or a situation — that leads to an effect: harm in this case.
The “cause” here is important — no cause, no risk.
This may seem self-evident. But all too often often risk is equated with the potential of something to cause harm. This may be captured as hazard (bleach is hazardous, so is a piano), or as speculative possibilities — AGI going rogue for instance.
And yet, if there is no causal pathway between hazard or speculation and outcomes, there is no risk.
Magnitude of impact
Next, the definition of risk above is concerned with the magnitude of an impact — the probability of something bad happening. In quantitative risk assessment, this is captured through numbers — quantitative probabilities. Yet in many cases the probability of harm occurring is either relegated to best guesses, or rough likelihoods.
Risk is typically understood as the probability of harm occurring from an action, process, or situation
Despite this, risk only makes sense if there’s some way to understand something about the magnitude of harm that might occur, whether this is quantitative, qualitative, or comparative. Without this, there’s nothing to stop resources being diverted to risks having negligible impact while ignoring others that may have substantial impact.
Of course, this also means that there needs to be some agreement on what negligible and substantial mean here — and this in turn means that risk is ultimately a social construct as it reflects broad societal values and norms.
Harm
Then there’s “harm” — and this also ends up being about societal values and norms rather than scientific definitions in many cases.
The type of harm associated with a given risk is important, as this can have a profound impact on risk mitigation strategies. For instance, the “harm” of feeling slightly unwell from being exposed to a substance is rarely considered to be comparable to the “harm” of near-certain death from exposure. And something that potentially impacts one or two people at most is usually considered to be in a very different category of harm than something that potentially impacts millions of people.
Things get even more interesting when considering different risk domains. If harm is thought of as diminishing or destroying something of value or worth (health, or the environment for instance), the simple formulation of risk above begins to take on different meanings depending on what type of value is threatened.
Risk is often approached as a threat to health, life, wealth, or the environment. These are all risk domains where substantial risk assessment and evaluation tools exist — albeit ones that draw on conventional understanding in many cases.
But what happens when this idea of harm is extended to other areas of value — mental wellbeing, happiness, sense of worth, identity, beliefs, aspirations, and beyond? And what if — as in the case of effective altruism — harm is defined in terms of what is of value to future generations?
In all of these cases, the causal links between harm and actions, processes, or situations, become increasingly hard to define.
Time
The situation gets even more complex when different risk timescales are considered. As was indicated above, risk timescales are relevant when potential impacts on future generations are taken into account. But even within the present, temporal relationships between cause and effect can make assessing and managing risk exceptionally challenging.
This is seen very clearly in attempts to address the long term impacts of acute exposures to harmful substances, as well as the impacts of chronic exposure. In both cases, long timescales have a tendency to mess up the lines between cause and effect to the extent that these are still poorly resolved and understood within conventional risk science.
Things get much harder though when trying to untangle temporal causal relationships where neither cause nor effect are well understood or defined — and this is where we are with most emerging technologies, AI included.
This all leads to a deeply complex risk landscape that only the foolish would claim to understand with certainty. But it gets worse.
Perception
For all our illusions of rationality, our understanding of and responses to risk are deeply tied to our evolutionary roots, and how we intuitively and unconsciously perceive and respond to the world around us.
How we think about risk is impacted by our cognitive biases, the heuristics or mental shortcuts we all use, and how we perceive the world we live in and base our actions and decisions on this. As a result, an understanding of risk cannot be separated from an understanding of human behavior and societal dynamics.
This is especially the case where the “harm” associated with a risk is subjective, but nevertheless at the very heart of what makes us human — our sense of identity, our beliefs, our worldview, relationships, and many other things that give our lives meaning.
This gets us into a realm of risk where we’re navigating perceived threats to subjective value — and value that may not be universally shared.
This also extends to threats to value that we may aspire to — a reminder that risk as defined above is not just about protecting what we have, but protecting pathways to futures we desire.
And this brings us back to AI risk.
Navigating AI risks
If risk in general is messy, AI takes this messiness to a whole new level.
Given the deep lack of understanding or agreement on relevant “effects”, uncertainty surrounding possible “causes”, and the unknown (and possibly unknowable) potential threads between cause and effect across multiple timescales, AI presents risk challenges that it would take a brave person to claim to have definitive insights into.
So where does that leave us?
The good news is that there are ways to approach this challenge that at least increase the chances of successfully navigating it. And they start by acknowledging that, as a global society, we face an increasingly complex AI risk landscape that’s going to take massive collaborative efforts and a good dose of humility to understand and navigate.
From the simple premise that AI is a powerful technology that will lead to future impacts that are connected in some way to causal actions, processes, or situations, we can assume that some of these impacts will have the potential to lead to harm.
What we don’t know is pretty much everything else — the types of actions, processes, or situations that we should be concerned about; the nature of harms potentially arising from AI (including potential future consequences arising from not developing new capabilities); the causal links between actions and consequences; the perturbation of these causal links by other factors — including social factors; the timescales over which these causal links might play out; and so on.
We don’t even know whether actions designed to manage potential risks — including regulations — could themselves constitute risks.
We do know though that we have the capacity to work together to understand and navigate these challenges — but only if we drop our egos, recognize the limits of our own understanding, and have the humility to listen to and learn from others.
This may sound like a bit of kumbaya thinking. But it’s grounded in years of evidence around what hasn’t worked in the past, and a growing recognition that the more we discover about AI, the less we seem to know about the potential risks and how to address them.
Whether we need AI (and AGI) to thrive as a species, whether AI holds the potential to crash human civilization, or whether the technology simply throws up a barrage of more mundane risks that we still need to contend with, the AI risk landscape is increasingly looking unlike anything we’ve had to navigate together before. This means that we’ll need to think beyond the limitations of convention and conventional expertise if we’re to thrive in an AI future. We’ll need ways of collaborating across disciplines that far exceed what we currently achieve. We’ll need to foster new ways of understanding the world we’re creating that transcend science and technology alone. And we’ll need to get increasingly agile in recognizing where we need to course-correct.
Maybe I was a little hubristic with my statement that everything you've ever heard about AI risk is wrong — there’s probably some stuff there that’s less wrong. Although the more I learn, the less sure I am.
Yet this is OK — it’s a new technology pushing us into uncharted water, and so we’re bound to be wrong on many fronts.
The question is, will we have the collective humility and imagination to learn together how to navigate the coming AI technology transition, rather than assuming we know with certainty what we don’t understand?
Addendum, November 27
Risk as a function of hazard and exposure
It’s been bugging me that, in the description of risk above, I didn’t use the classic “risk = hazard x exposure” formulation (or more accurately, risk as a function of both hazard and exposure).
This formulation is implicit in the discussion of cause-effect and magnitude of response, but it does take some teasing out.
The reason it doesn’t appear more explicitly is that I wanted to develop a broader understanding of risk that extends beyond the hazard-exposure paradigm that dominates approaches to chemical risk assessment — especially as the paradigm can get gnarly even when it is used, where the relationship between exposure (or more accurately dose) and response is non-linear (as is the case with threshold responses, hormesis, and other low-dose responses). I also didn’t want to fall into the trap of implying that zero exposure — as in no AI — is a default risk management strategy.
That said, it’s worth a quick exploration of how the risk as a function of hazard and exposure paradigm might apply to AI.
The first thing to note is that this paradigm in it’s most fundamental formulation states that exposure is necessary to convert a hazard (the potential to cause harm) into risk (the probability of harm occurring). However, it doesn’t specify the form of the transforming function.
This function may be linear. It may have a threshold where exposure below a certain level leads to no risk. It may lead to increased risk with decreasing exposure in some cases or, conversely, decreasing risk with increasing exposure in others. Or it may take on more complex forms, and forms that also have a time component.
The thing is, without understanding the function that transforms hazard into risk, the most important takeaway from this paradigm is that, for risk to occur, there needs to be exposure.
This should be fairly self-evident. But it doesn’t address the question of what hazard and exposure mean in the context of a transformative technology like AI.
Here, “hazard” could be as obvious as disrupting financial services, or as subtle as influencing human behavior. Likewise “exposure” could be as straight forward as an AI having access to and the agency to manipulate critical systems, or as intangible as hints of ideas encountered over hours of social media use.
The lack of even the beginnings of a framework to identify what might constitute risk, hazard, exposure, and the transforming function, in the relationship “risk = Fn(hazard, exposure)” makes this a challenging paradigm to apply to AI.
And yet, despite this, there may well be mileage in using the hazard-exposure paradigm in the context of AI to reveal new ways of thinking about and understanding risk. But even if there is, I suspect that this will need to be part of broader transdisciplinary efforts to rethink and reformulate what we mean by risk in the face of a technology that defies conventional thinking.
Donald Rumsfeld famously spoke about "Unknown Unknowns" several years ago and recently I've read about the theory that separates risk from uncertainty, which reserves the word risk for conditions with known probability. Whatever you call it, AI clearly falls into the less knowable category.
Andrew, your piece here is especially relevant given the recent implosion of the board at OpenAI and all of the turmoil around Sam Altman and the direction (and speed) at which they are improving their products. If AI risk, as you point out, is difficult to assess, then it just got harder. It seems the “business” side of OpenAI won out over the “non-profit” (effective altruism)side. But perhaps this may have also elevated the awareness of the approach by companies like Anthropic who seem to have a more cautious and thoughtful “constitutional” framework for AI. There is no doubt that companies implementing AI solutions will likely hedge their bet (less risk) by considering multiple LLMs rather than hitching their AI horse to one wagon.