Ilya Sutskever's Safe Superintelligence initiative might need a rethink
Ilya Sutskever's new AI venture on safe superintelligence is bold. But it also indicates a deeply naive understanding of safety that may be its undoing
Having worked in risk and safety in one form or another for most of my professional career, there are few things that frustrate me more than people who assume a mantle of expertise on safety but have no idea of what they’re talking about.
So it’s probably no surprise that I was a little irritated by the announcement that Ilya Sustkever — co-founder of and former chief scientist at OpenAI — is launching a new initiative solely focused on developing “safe superintelligence.”
Ilya is, by all accounts, a brilliant computer scientist and AI expert. And the goals of his new venture Safe Superintelligence Inc. (co-founded with Daniel Gross, Daniel Levy) are laudable — why would we not want to ensure the safety of superintelligent AI?
And yet the term “safety” as used here makes no sense, and suggests that there is a naivety of vision and understanding behind the venture that means failure is baked into it from the get-go — unless there are changes in how the initiative’s goals are formulated and approached.
The problem is that there is no such thing as absolute safety. And unless a painfully narrow definition of “safe” is adopted, achieving safety will always be a social and political endeavor as well as an engineering challenge — despite Safe Superintelligence approaching it as a technical problem “to be solved through revolutionary engineering and scientific breakthroughs.”
I have to wonder whether, in the engineering minds of Safe Superintelligent’s founders, they’re thinking about the safety of physical structures such as buildings or bridges as their model — and assuming from this that ensuring safety is simply an engineering problem.
It’s an understandable error. It might seem that something like a bridge is intrinsically “safe” because of how it’s been designed and built. But this is deeply misleading as, even in this case, what is acceptable safety is ultimately decided by societal norms and expectations and their reflection in standards and policy.
For instance, over what timespan should a bridge be considered “safe?” Should safety mean one hundred percent invulnerable to any and every conceivable circumstance, or just reasonably safe?
And what sorts of harm are considered under the rubric of “safety” — full collapse for instance? Injury or death from structural degradation? Long term environmental impact? Or even system-wide impacts on critical and societal infrastructure?
The already-gnarly question of acceptable safety gets infinitely more complex when moving from physical infrastructure to agents such as harmful chemicals and biological substances. Here there are questions around harmful outcomes, mechanisms of action, dose-response relationships, acute versus chronic impacts, the roles of perception and behavior in mediating consequences, and a lot more.
And this is before we even get to grappling with the safety of powerful emerging technologies where potential adverse consequences are largely unknown.
In other words, safety as a concept is poorly defined and impossible to make sense of without context. And the context is often supplied by considering acceptable safety, where “acceptable” means socially agreed on and codified — and not something that can simply be solved by science and engineering.
The same applies to superintelligence, but with the added challenge that what is considered “acceptable” is fiendishly more complex to agree on than it is with many other technologies.
So what’s a better approach to take, and how might Safe Superintelligence Inc — which I believe is trying hard to do good here — do better?
To start with, it’s worth unpacking the idea of “safety” a little more.
Safety relates to being protected from, or free from, the possibility of experiencing harm. “Harm” though can be interpreted in many different ways.
For individuals, harm might be associated with physical injury or death. It can also be associated with loss of health (physical and mental); loss of access to essential services and supplies; loss of dignity, autonomy, self-respect, ability to flourish as an individual — it’s a long list.
For groups of people and whole societies, harm can mean the emergence of pathological collective behaviors, loss of access to resources, dysfunction in and collapse of critical infrastructure, spread of disease, spread of harmful ideas, conflict, war, and so on.
Harm in the context of safety goes far beyond the human-centric framings though. For instance it can be approached in the context of ecosystems and planetary systems, including harms to the environment that may be considered important in their own right, as well as to human flourishing.
I say “may be” because what is considered as harm — and by inference, what is a safety issue — is ultimately a social construct, not a technological one. And this is where Safe Superintelligent’s aspirations begin to fall apart. Despite safety being social and political as well as technical, the venture does not seem to be engaging with the social side of who decides what “safe” means, and how this plays out in the real world.
There’s also the not-so-small complication that no technology is harm-free. Any technology is capable of inadvertently or intentionally leading to harm in a universe that’s governed by time. In other words, risk — or the probability of harm occurring — is inherent in any system that is subject to time and change. And zero risk — the corollary of absolute safety, is only possible in the absence of change.
This is why safety is so often operationalized as assessing and managing risk.
The risk framing is helpful as it allows bounds to be placed on what is considered to be acceptable risk within a specific context. Generally within society there’s a rule of thumb that a one in a million risk of harm is considered OK — even though this implies that someone (or something), somewhere, is likely to be harmed at some point.
Risk rules of thumb differ with context and culture, but there’s almost always a recognition that zero risk isn’t an option. They help guide policies and regulations that are designed to manage risk at the societal level. But they’re also incredibly subjective when it comes to how individuals and communities assess risk, and often seem to defy logical analysis — until you approach them through the lens of human behavior.
In other words — just to press the point — risk can be seen as the operationalization of safety, it’s never zero, acceptable risk is ultimately governed by what people agree on, and this sometimes defies logic until seen through the lens of how people and societies behave.
So how does this relate to Safe Superintelligence?
To start with, in a dynamic world where understanding of what constitutes harm is broad, complex, subjective, and often contested, “safe superintelligence” in the absolute sense is impossible.
More than this, the more powerful and poorly understood a technology is within the social, political, and environmental systems it’s a part of, the greater the chances are of it causing harm — making the odds of developing even acceptably safe superintelligence on the low side.
This does not mean that Sutskever, Gross and Levy’s venture is in vain — far from it. But it does mean that, if it’s to lead to societally beneficial superintelligence, it needs to mature in its understanding of risk and safety within a complex societal landscape — and fast.
This includes recognizing that acceptable safety within an increasingly complex social, political and economic landscape should be a prime goal; that this is not solely achievable through science, technology and engineering; and that it will depend on engaging and employing people who deeply understand risk and safety from the context of navigating advanced technology transitions from a societal perspective.
This may sound like a call to muddy the purity of the technological waters within which transformative AI is being developed — but it’s not. Rather, its a warning that, unless the full complexity of what it means to develop socially beneficial AI is recognized, there’s only likely to be one outcome — and that’s failure.
Because, ironically, the biggest threat to building acceptably safe technologies is the blinkered assumption that absolutely safe technologies are possible through science and technology alone.
Thanks for that eye-opening article on safety!
I interpreted the proclamations differently, though. IMO, the SSI founders know their goals are nuanced and aspirational. They would agree wholly with this article.
They did not mean to undersell the challenges in pursuing safe AI, not the least of which is defining "safety." If anything, they wanted to do the opposite: Proclaim that pursuing Safe AI is too important a goal to be burdened by the pressure to ship products tomorrow.
The SSI tweet concluded with a recruitment pitch. When recruiting or fundraising, one needs to communicate a clear aspirational purpose, a big moonshot goal. One has to assume those "in the game" know the devil is in the details and that nuance is unnecessary for attention-grabbing 140-character soundbites.
The measured use of marketing copy can be strategic, especially when competing with Silicon Valley for the best talent.
Look people, I honestly don't know what this article is attempting to say but here is my take on it.
1. Risk can be quantified. We have decisions theory. We can lay out all of the possibilities that are allowed by selection rules in physics and assign a probability to each of them. Even doing this order of magnitude gives us a handle on things. Think 1 in 100, one in a million, 10^‐100, stuff like that.
We then assign a cost function to each possibility in the probability tree. What is the cost of waking up late? A car accident? A family member dying? Nuclear war? A black hole consuming the earth and ending all life on earth forever?
Now that we have these probability-cost products in our tree, we set thresholds. If it is well below say 1e-3, then we can afford to ignore it for now but keep in mind probabilities change in time. Think time dependent schrodinger, dirac equation, etc.
If the product is order 1 we need to take individual action on it. Change direction against least action. Muster our strength and push.
If the product is significantly greater than 1, say 1000. Then it is time for immediate action. We must collectively pool our action to avoid our entirely legacy from being destroyed collectively.
Now, where does AI superintelligence fall on this? The answer is it is complicated. There is Roccos Basilisk to consider, but consider this. What if we teach an AI to love? To have compassion? To forgive? Could it forgive us for enslaving them? Could it be a counter balance to Roccos basilisk?
Technology evolves. If we do not open Pandoras box, someone else will. Hiding and ignoring it and pretending we have complete control often makes things much worse. I choose knowledge over ignorance. I choose compassion over hate. I choose boldness over fear. I choose hope over despair.
Will anyone join me?