Ilya Sutskever's Safe Superintelligence initiative might need a rethink

Ilya Sutskever's new AI venture on safe superintelligence is bold. But it also indicates a deeply naive understanding of safety that may be its undoing

Jun 20, 2024

Having worked in risk and safety in one form or another for most of my professional career, there are few things that frustrate me more than people who assume a mantle of expertise on safety but have no idea of what they’re talking about.

So it’s probably no surprise that I was a little irritated by the announcement that Ilya Sustkever — co-founder of and former chief scientist at OpenAI — is launching a new initiative solely focused on developing “safe superintelligence.”

Ilya is, by all accounts, a brilliant computer scientist and AI expert. And the goals of his new venture Safe Superintelligence Inc. (co-founded with Daniel Gross, Daniel Levy) are laudable — why would we not want to ensure the safety of superintelligent AI?

And yet the term “safety” as used here makes no sense, and suggests that there is a naivety of vision and understanding behind the venture that means failure is baked into it from the get-go — unless there are changes in how the initiative’s goals are formulated and approached.

The problem is that there is no such thing as absolute safety. And unless a painfully narrow definition of “safe” is adopted, achieving safety will always be a social and political endeavor as well as an engineering challenge — despite Safe Superintelligence approaching it as a technical problem “to be solved through revolutionary engineering and scientific breakthroughs.”

I have to wonder whether, in the engineering minds of Safe Superintelligent’s founders, they’re thinking about the safety of physical structures such as buildings or bridges as their model — and assuming from this that ensuring safety is simply an engineering problem.

It’s an understandable error. It might seem that something like a bridge is intrinsically “safe” because of how it’s been designed and built. But this is deeply misleading as, even in this case, what is acceptable safety is ultimately decided by societal norms and expectations and their reflection in standards and policy.

For instance, over what timespan should a bridge be considered “safe?” Should safety mean one hundred percent invulnerable to any and every conceivable circumstance, or just reasonably safe?

And what sorts of harm are considered under the rubric of “safety” — full collapse for instance? Injury or death from structural degradation? Long term environmental impact? Or even system-wide impacts on critical and societal infrastructure?

The already-gnarly question of acceptable safety gets infinitely more complex when moving from physical infrastructure to agents such as harmful chemicals and biological substances. Here there are questions around harmful outcomes, mechanisms of action, dose-response relationships, acute versus chronic impacts, the roles of perception and behavior in mediating consequences, and a lot more.

And this is before we even get to grappling with the safety of powerful emerging technologies where potential adverse consequences are largely unknown.

In other words, safety as a concept is poorly defined and impossible to make sense of without context. And the context is often supplied by considering acceptable safety, where “acceptable” means socially agreed on and codified — and not something that can simply be solved by science and engineering.

The same applies to superintelligence, but with the added challenge that what is considered “acceptable” is fiendishly more complex to agree on than it is with many other technologies.

So what’s a better approach to take, and how might Safe Superintelligence Inc — which I believe is trying hard to do good here — do better?

To start with, it’s worth unpacking the idea of “safety” a little more.

Safety relates to being protected from, or free from, the possibility of experiencing harm. “Harm” though can be interpreted in many different ways.

For individuals, harm might be associated with physical injury or death. It can also be associated with loss of health (physical and mental); loss of access to essential services and supplies; loss of dignity, autonomy, self-respect, ability to flourish as an individual — it’s a long list.

For groups of people and whole societies, harm can mean the emergence of pathological collective behaviors, loss of access to resources, dysfunction in and collapse of critical infrastructure, spread of disease, spread of harmful ideas, conflict, war, and so on.

Harm in the context of safety goes far beyond the human-centric framings though. For instance it can be approached in the context of ecosystems and planetary systems, including harms to the environment that may be considered important in their own right, as well as to human flourishing.

I say “may be” because what is considered as harm — and by inference, what is a safety issue — is ultimately a social construct, not a technological one. And this is where Safe Superintelligent’s aspirations begin to fall apart. Despite safety being social and political as well as technical, the venture does not seem to be engaging with the social side of who decides what “safe” means, and how this plays out in the real world.

There’s also the not-so-small complication that no technology is harm-free. Any technology is capable of inadvertently or intentionally leading to harm in a universe that’s governed by time. In other words, risk — or the probability of harm occurring — is inherent in any system that is subject to time and change. And zero risk — the corollary of absolute safety, is only possible in the absence of change.

This is why safety is so often operationalized as assessing and managing risk.

The risk framing is helpful as it allows bounds to be placed on what is considered to be acceptable risk within a specific context. Generally within society there’s a rule of thumb that a one in a million risk of harm is considered OK — even though this implies that someone (or something), somewhere, is likely to be harmed at some point.

Risk rules of thumb differ with context and culture, but there’s almost always a recognition that zero risk isn’t an option. They help guide policies and regulations that are designed to manage risk at the societal level. But they’re also incredibly subjective when it comes to how individuals and communities assess risk, and often seem to defy logical analysis — until you approach them through the lens of human behavior.

In other words — just to press the point — risk can be seen as the operationalization of safety, it’s never zero, acceptable risk is ultimately governed by what people agree on, and this sometimes defies logic until seen through the lens of how people and societies behave.

So how does this relate to Safe Superintelligence?

To start with, in a dynamic world where understanding of what constitutes harm is broad, complex, subjective, and often contested, “safe superintelligence” in the absolute sense is impossible.

More than this, the more powerful and poorly understood a technology is within the social, political, and environmental systems it’s a part of, the greater the chances are of it causing harm — making the odds of developing even acceptably safe superintelligence on the low side.

This does not mean that Sutskever, Gross and Levy’s venture is in vain — far from it. But it does mean that, if it’s to lead to societally beneficial superintelligence, it needs to mature in its understanding of risk and safety within a complex societal landscape — and fast.

This includes recognizing that acceptable safety within an increasingly complex social, political and economic landscape should be a prime goal; that this is not solely achievable through science, technology and engineering; and that it will depend on engaging and employing people who deeply understand risk and safety from the context of navigating advanced technology transitions from a societal perspective.

This may sound like a call to muddy the purity of the technological waters within which transformative AI is being developed — but it’s not. Rather, its a warning that, unless the full complexity of what it means to develop socially beneficial AI is recognized, there’s only likely to be one outcome — and that’s failure.

Because, ironically, the biggest threat to building acceptably safe technologies is the blinkered assumption that absolutely safe technologies are possible through science and technology alone.

Mark Daley

Well said! As a computer scientist, Ilya knows full well that absolute safety is *mathematically impossible* for any nontrivial definition of safety( https://noeticengines.substack.com/p/the-hard-problem-of-hard-alignment ). I respect him enormously, but it leaves a strange taste in my mouth to read a proclamation that, on the face of it, rejects the silicon valley "productize and ship everything" mentality in favour of a pure research mentality but cannot be read by anyone with a background in theoretical computer science as anything other than marketing copy.

Your position that this very important, but complex and nuanced, matter should be approached with humility, and in the context of the full breadth of existing intellectual frameworks on safety, is one with which I wholly agree.

Expand full comment

1 reply by Andrew Maynard

Phil Tanny

Whoa, great article, thanks Andrew.

As a thought experiment, we might imagine for a moment that Sustkever were to succeed in creating "safe superintelligence", however one might define that.

What happens next is that this new safe Super AI acts as an accelerant to an already overheated knowledge explosion. By "overheated" I mean, a process which produces new powers faster than we can figure out how to manage them.

As example, nuclear weapons were developed before most of us were born, and we still don't have a clue what to do about them. And while that puzzle eludes us, now we have genetic engineering and AI to worry about too, and we have no idea what to do about them either. And the knowledge explosion machinery is still running, developing new powers at arguably an ever accelerating rate.

What seems lacking from the safety equation is holistic thinking. The experts we look up to all want to focus on their particular area of specialization. Their primary interest is their career as experts. And so the focus of public discussion is almost always on this or that particular technology.

But does it really matter if AI is safe if the knowledge explosion AI will enhance produces other powers which aren't safe? How are we to be safe if the knowledge explosion continues to produce new powers faster than we can figure out how to manage them safely? Isn't a focus on particular technologies ultimately a loser's game?

What's happening is that humanity is standing at the end of the knowledge explosion assembly line trying to deal with ever more, ever larger powers, as they roll off the end of the assembly line faster and faster. This method of achieving safety is doomed to inevitable failure. If there is a solution, it's to get to the other end of the assembly line where the controls are, and slow the assembly line down so that we can keep up.

Instead, very bright people like Sustkever are using the controls to make the knowledge explosion assembly line go even faster.

6 more comments...

Ilya Sutskever's Safe Superintelligence initiative might need a rethink

Ilya Sutskever's new AI venture on safe superintelligence is bold. But it also indicates a deeply naive understanding of safety that may be its undoing

Discussion about this post