Can AI be used to automate social science research?
A new paper on arXiv provocatively asks whether AI can do the same for social science as it's doing for discovery in other areas
A new paper was posted to the preprint service arXiv this past week that is already raising eyebrows.
In “Automated Social Science: Language Models as Scientist and Subjects” authors Manning, Zhu and Horton not only ask whether advances in artificial intelligence can be used to cut out human subjects and researchers in social science studies — they show that, in principle, this is possible!
The premise behind the paper is an intriguing one. The authors argue that, because Large Language Models (LLMs) represented an unprecedented corpus of information on human nature and behavior, they should be able to simulate human behavior to the extent that they can substitute for real subjects in social science research.
That’s a pretty strong premise — although it isn’t a new one.
This is just the start though. The paper’s authors go on to argue that, because of the way that LLMs operate, they should also be able to develop testable hypotheses on how people might behave under a given set of circumstances.
Put these two together, and you have the possibility of carrying out social science research with simulated people and simulated scientists — essentially cutting out the messiness of including real people in research about society.
Crazy as it sounds, this makes a lot of sense.
It’s also highly controversial, as it begins to open up the possibility that machines are not only capable of learning about humans faster and more effectively than we’re capable of learning about ourselves, but that they could use this knowledge in ways that have far-reaching implications for society.
In the paper the authors consider four scenarios where, from a human perspective, the outcomes would be reasonably clear: Two people bargaining over a mug; a judge setting bail for a criminal defendant who committed 50,000 dollars in tax fraud; a person interviewing for a job as a lawyer; and three bidders participating in an auction for a piece of art starting at fifty dollars.
In each case, the AI system generates testable hypotheses, creates agents to act as simulated humans, generates survey questions to use with these agents, designs how the agents will interact, runs experiments, gathers data, and analyzes the results.
Through running this process with the four scenarios, the researchers found that, through experimentation, the AI system was able to make effective predictions of likely outcomes.
This isn’t too surprising as each LLM that was part of the system — whether simulating subjects or scientists — would have been reflecting massive aggregated data on human behavior. And, of course, the scenarios were relatively simple.
But the implications are startling.
Research involving human subjects is notoriously challenging. Once an often-arduous Institutional Review Board approval process has been successfully navigated, designing experiments and getting a suitably large and representative sample of people to work with — and then ensuring that the data and analysis are robust — is not easy. And in many cases issues around sample size, bias, experimental design, and small signal to noise ratios, make extracting robust conclusions from long and difficult to run studies extremely difficult.
By successfully simulating subjects using AI, almost all of these challenges could, in principle, be overcome — as long as the simulations allowed causal inferences and models of behavior to be developed that reflect the real world.
If this was possible it would be a game changer. But it would still depend on the experiments themselves to be conducted by human scientists.
What this study shows though is that the idea of replacing humans with simulated people can also be extended to the scientists themselves. And this is where, I suspect, the paper will begin to make some researchers uncomfortable.
Care needs to be taken here, as the AI’s in the paper have no understanding of what a relevant question or a useful outcome is. It’s still down to human researchers to decide this. But by automating the process of hypothesis formation and subsequent research leading to models and insights on social behavior, the paper indicates that AI could vastly accelerate the rate at which we generate understanding about how people behave under a variety of circumstances within society.
It also implies that machines could generate knowledge about ourselves that is inaccessible to human researchers alone.
As the authors point out, AI is already being used to advance scientific research in areas like new proteins, new materials, and new drugs. And the just-released 2024 AI Index report underlines the speed with which artificial intelligence is accelerating discovery in multiple areas.
In some of these areas, discoveries are being generated through working with AI that are unlikely to have been made by human scientists alone — such as the recent discovery of a new class of antibiotics.
If this can be achieved in the natural sciences, why not in the social sciences?
The big threat here, of course, is the idea that machines can be used to learn about ourselves in ways that support the idea that how we behave is predictable using machine-generated models.
To those who believe there is something unique about humanity that is not reducible to numbers and equations, I can see the concept of automated social science being something of a challenge to them. And embedded in this are legitimate concerns around bias and an amplification of specific beliefs and worldviews being embedded in AI experimentation.
Yet to many others, I suspect that the possibility of accelerating our understanding of how society works as a step toward supporting future human flourishing is an exciting one — and one that simply reflects advances in other areas of science and discovery.
There is a note of caution that’s worth throwing in here though. We are providing AI foundation models with a mind-boggling wealth of information on how and why people behave as they do, both individually and collectively. And we’re developing and training models that are able to ask questions, conduct research, and develop actionable models from this.
We’re also building AI systems that can act on these models to solve problems in agile and innovative ways.
In other words, we’re getting closer to machines that can study and understand how people behave and that can, in principle, use this to influence our behavior to achieve specific goals.
The research in the just-released paper is a long, long way from this. For one, the scenarios tested involved simulating individual people engaging in human-human interactions. Importantly, they did not consider collective behaviors and actions.
Yet it’s not such a large step from studying interactions between two or three people to studying interactions between groups, all through AI-driven simulations. It may even be easier for AI’s to build and experiment with society-wide models.
If they do — and assuming that we are still barely scratching the surface of what AI is capable of — we could be looking at a new era of machine-enabled research on predicting and nudging human behavior.
Depending on your point of view, that could be highly liberating, or deeply chilling …
It reminds me of this paper: https://arxiv.org/pdf/2304.03442.pdf
“Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents: computational software agents that simulate believable human behavior.”
And:
“A society full of generative agents is marked by emergent social dynamics where new relationships are formed, information diffuses, and coordination arises across agents.”
I would add a critical note to that: AI agents DO NOT in any shape or form replicate the complexities of real human behavior, emotions and thought. So, while the concept and application of AI in the social sciences could be valuable, caution is advised.
A concern that I have is the data that’s being used to power these models could lead to results that amplify predictions of prejudicial behavior — as described in Weapons of Math Destruction by Cathy O’Neil. Garbage in, garbage out, as they say. If all, much, or even some of the historical data a model has access to is based on flawed/prejudicial/disproven social theory and evidence, then the predictions of the models will necessarily be skewed.
A colleague described Chat GOT as “the world’s most powerful graphing calculator” and that seems to me the most appropriate use case thus far. I understand that LLMs are exciting new toys and researchers are trying to find the boundaries of how they might be used, but this application for simulating social science experiments strikes me as inappropriate. Social science experiments are hard to run because they are hard to do well and easy to screw up. I reject the silicon valley bro-science thinking that any arduous task can be done easier if a “disruptive” engineering approach is taken. Some things are difficult! Some data is hard won and cannot be cheated! There is no royal road to geometry!