A new framework for guiding AI agent oversight
As AI agents become more prevalent and powerful, a new paper outlines a framework for better-understanding how to govern and regulate them.
When OpenAI’s CEO Sam Altman wrote back in January that “in 2025, we may see the first AI agents ‘join the workforce’” he effectively predicted this to be the year of agentic AI.
While much of Altman’s new year’s reflection still reads more as hype than reality, interest in artificial intelligence systems that can make their own decisions and influence the environment they operate in has accelerated over the past few months.
Reflecting this, the latest versions of reasoning models from companies like OpenAI, Anthropic and Google exhibit a growing ability to decide how they’re going to solve problems and achieve the goals they’re set.1 And the release of the agentic AI Manus just a few weeks ago is shaking up what’s possible when an AI is given access to compute and the internet, and the license to interpret what it’s being asked to do and how it will achieve this.2
These technologies increasingly appear to be forerunners of an agentic AI wave that could transform how artificial intelligence is used everywhere from research and teaching to business, healthcare, government and much more. As this wave grows though, there are growing questions around how to ensure AI agents lead to positive outcomes — and don’t cause unnecessary harm.
In effect, the question of how we govern emerging AI technologies that are capable of deciding for themselves what they do to achieve their ends when their actions have very material consequences is becoming increasingly important.
Yet, despite the speed with which agentic AI is developing, there are no easy answers to how to ensure its safe and responsible use. In part this is because we are in new territory here — we’ve never had the ability to create machines that can decide on their own how to solve problems, and then — without human supervision — begin to alter the world around them to do this.
But even before we can grapple with challenge this a problem: there is no current accepted and easy-to-apply definition of an AI agent that is designed to support informed oversight and governance.
In other words, we’re not even sure yet how to formulate the problem of AI agent governance, never mind work out how to address it.
To address this, Atoosa Kasirzadeh from Carnegie Mellon University and Iason Gabriel from Google DeepMind have just published a paper3 that sets out to develop a framework for informing the oversight and governance of AI agents. And while it’s just one step toward ensuring agentic AI is developed and used responsibly and beneficially, it is an important one.
In “Characterizing AI Agents for Alignment and Governance” (available on arXiv), Kasirzadeh and Gabriel conduct a deep dive into a range of definitions of agentic AI before developing their own. What emerges is a multidimensional approach to understanding agency that considers the potential of agentic AI to have impact as well as the nature and scape of that impact. And it’s one that I think provides a necessary step toward developing effective approaches to ensuring safe, beneficial and responsibleAI agents.
As their starting point, Kasirzadeh and Gabriel consider seven definitions of AI agents from computer science, starting with the classic 1995 definition from Stuart Russell and Peter Norvig of an agent as “anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors.”4
This aligns closely with the loose definition I tend to use, which is an AI that is able to determine how to achieve a set of goals — and adjust those goals if necessary — by manipulating the environment around it, whether this is digital, physical, environmental, behavioral, social, or a combination of these.5
This is a useful starting point. But as Kasirzadeh and Gabriel point out, it’s somewhat limited when it comes to making decisions around the responsible development and use of highly sophisticated AI agents.
To address this they developed a framework that considers four dimensions associated with agency: autonomy, efficacy, goal complexity, and generality.
By assigning descriptive levels to each of these — much as cars are assigned levels of autonomy from L0 (human in control all the time) to L5 (vehicle can drive anywhere on its own)6 — they set out to create a multidimensional space that captures the nuances of agentic AI.
Autonomy
Autonomy is perhaps the clearest of these dimensions, and represents the degree to which an AI has the ability to perform tasks without human oversight — analogous to levels of driving autonomy (and this is a model that Kasirzadeh and Gabriel explicitly use). And so A0 represents no AI autonomy (human in complete control), and A5 represents an AI system that is able to perform all tasks it’s set without any human oversight or control.7
Efficacy
The dimension of efficacy is more complex as it seeks to capture the ability of an AI agent to interact with its environment and to have an intentional (or causal) impact on it. This is important as it’s possible in principle to have a fully autonomous AI that has very limited ability to impact its environment, or a partially autonomous AI that can, nevertheless, have a profound and intentional impact on the environment it engages with.
Like autonomy, efficacy is assessed on a six point scale (from E0 to E5), but each level is a combination of two factors: the level of potential causal impact an AI can have, and the type of environment it has influence within.
For instance, an AI that is only capable of minor causal impact, but where that impact is directly realized in the physical world, would be classified as having an efficacy level of 3. Alternatively, an AI that can “significantly reshape its environment across multiple dimensions, approaching full environmental control” but that is only capable of exerting this impact indirectly, would also be classified as having an efficacy level of 3.
The three environments that Kasirzadeh and Gabriel consider are simulated environments (with controlled boundaries and often resettable system states), mediated environments (where an AI has impact through human intermediaries), and physical environments (where the AI can interact with tangible objects in material spaces without human mediation).
I’m not fully convinced that the proposed efficacy scale has it right yet, as it’s unclear how the risks of an AI working within a simulated environment (the “safest” type of environment) might be realized, or how direct causal effects on the beliefs, understanding, and behaviors of individuals and groups fits within the model. But it is nevertheless an important step toward recognizing that responsible agentic AI needs to consider impact as well as autonomy.
Goal Complexity
The dimension of goal complexity continues to add to the nuance with which agentic AI is being assessed in this framework. Goal complexity refers to the degree to which an AI is able to handle complex goals by breaking them down into granular sequences of actions. It’s grounded in the assumption that an AI agent that’s capable of taking a multi-step and multi-pathway approach to achieving complex goals (all while balancing and appropriately sequencing them while dynamically changing plans depending on circumstances) is likely to present a greater risk than one that just does one thing at a time.
This dimension looks increasingly important with the advent of agentic AIs like Manus which breaks down large tasks into an interrelated sequence of sub goals, and which continuously reviews and revises these based on progress.
As with the previous two dimensions, goal complexity has six levels which go from GC0 (no agentic abilities) to CG5 (the ability to break down a complex goal into many different subgoals, where success depends upon balancing and sequencing subgoals).
Generality
Finally there’s the dimension of generality. This dimension differentiates between AI agents that are designed to do one task well, but only operate within a narrow domain (level G1) to AI agents that are as effective as humans at adaptively addressing tasks across many different domains (level G5).
This dimension captures concerns that AI agents that can apply a general set of problem-solving and goal-seeking abilities across a wide range of domains — and domains that they haven’t bee explicitly been trained on — reflect a capacity that is more likely to be disruptive and dangerous than if this ability was somehow limited. In effect, it’s the “human dimension” where the ability of people to problem solve under unique conditions make us both incredibly effective at what we do and potentially very dangerous indeed.
Pulling it all together
Together, these four dimensions begin to allow AI agents to be classified in ways that inform governance approaches that neither stifle innovation because they treat agentic AI as a one-size-fits-all technology, or fail to ensure it’s responsible development and use because they are designed for technologies that don’t match reality.
In their paper, Kasirzadeh and Gabriel illustrate rather well how these dimensions can be displayed on a simple radar plot, and demonstrate this using four different existing AI agents (DeepMind’s AlphaGo, OpenAI’s ChatGPT 3.5, Anthropic’s Claude 3.5, and Waymo’s autonomous cars).
Extending this idea, I spun up a simple app (with AI’s help, of course) on the website fvture.net/agentPlot that allows Kasirzadeh and Gabriel’s four examples to be compared with each other, as well as with other AI agents:8
The plot format differs from that in the paper in that it’s rotated through 45 degrees (my personal design ethics kicking in), and it allows the scale to be “stretched” so that higher values on each axis have more visual impact — based on the assumption that the risk associated with going from 0 - 5 on each axis is likely closer to exponential than linear.
Apart from that, it’s a direct way of exploring the profiles of emerging AI agents within Kasirzadeh and Gabriel’s framework, and how this might impact governance and oversight decisions.
For instance, the plot below shows the profile of a hypothetical embodied agentic AI, such as might be found in a humanoid robot:
While Kasirzadeh and Gabriel’s paper is just one step toward the effective governance of agentic AI, it is an important one. And hopefully it’s one that stimulates more work — and fast — as AI agents are evolving rapidly, and (at the moment) are outpacing our understanding how to develop and use them responsibly.
While these models are restricted to read-only mode while accessing the internet, their ability to use simulated reasoning to achieve complex tasks is impressive.
Manus isn’t the only system capable of doing this — a growing number of Ai agents can surf, read, and use the internet in ways that are analogous to humans. But Manus is currently pushing boundaries in ways few other companies are at present.
Atoosa Kasirzadeh and Iason Gabriel (2025). Characterizing AI Agents for Alignment and Governance. arxiv.org/abs/2504.21848
This is from the 1995 edition of Russell and Norvig’s book Artificial Intelligence: A Modern Approach. (Prentice-Hall, Englewood Cliffs, New Jersey, 1995).
These domains — which are mine — overlap quite a bit, and may not be comprehensive.
For more see the SAE Levels of Driving Automation: https://www.sae.org/blog/sae-j3016-update
For a comprehensive description of each dimension see the original paper or https://fvture.net/agentPlot/
Coming out of my work cocoon as the ASU year ends, I’ve been enrapt reading this Substack and the HSD Ph.D. program in general. Thank you for updating & sharing as you do.
This summer my wife (1st grade Spanish immersion teacher) and I (full-time at Central Arizona College & ASU ENG associate faculty) are interested in using personal observations, sons’ interests, & appropriate learning needs to prompt engineer individualized AI agents/life tutors to begin mentoring our children to use for focused, diverse life learning situations. Any suggestions on beginning to prompt-engineer a safe, adaptable AI agent that can engage in dialogue & activities via appropriate personas that align to our kiddos’ respective needs & interests?
Our “ositos,” 6 & 7 and dual Mexican-American citizens with foundations in English and Spanish, may benefit from guided assistance by us this summer to begin supplementing school mentors with personalized AI for adaptive conversations & activities (as you’ve written about and I did w/ ChatGPT in my Spring ‘25 ASU ENG 102 courses, students leveraging ChatGPT as research assistants) in support of developing their dialectic skills & overall learning, in & out of the classroom.
One, a 7-year-old, struggles with some schoolbook learning but loves going with me to concerts & learning piano via YouTube tutorials on “Sympathy for the Devil,” “Beautiful Day,” Super Mario Bros. theme, & movie scores, suggesting he’d learn through AI-generated song & karaoke (e.g., from Suno song & lyric generator) across the curriculum.
Our 6-year-old with Level 2 ASD & ADHD, who misbehaves in kindergarten after finishing worksheets in 1/10th the time as classmates, needs to improve social emotional conversational skills and flourishes when interacting with “princesses” ranging from Princess Peach to Snow White to Wonder Woman.
I dove deep into ChatGPT as a research assistant these past four semesters, but haven’t used Gemini, Manus, or many other tools extensively. While we may supplement our AI agents with work on educational platforms like Khanmigo & interest-based tools like Suno, we’d appreciate your thoughts on how to responsibly & effectively approach creating AI “life tutors” for our kiddos?
In A5, can the agent adjust or change the goals which are set by humans? Even better, can the agent decide on goals by itself?