Can large language models be used for predictive policing? And if so, should we be worried?
Large language models and interfaces like ChatGPT can predict how a person might respond to a question with uncanny accuracy. Can they also predict what people might do?
Back in 2018 I did a deep dive into technology-enabled predictive policing for the book Films from the Future. And what I discovered shook me.
I had naively assumed that the idea of using science to predict where someone lies on a scale from “good” to “bad” (and thus their propensity to commit crimes or otherwise behave badly) was long-gone — along with phrenology, eugenics, and a whole host of other pseudoscientific schemes. So it was a shock to the system to discover that attempts to determine whether someone is more of less likely to be a criminal are still alive and kicking.
These attempts stretch all the way from using facial features to ascertain criminal tendencies (and I cannot believe that people are still researching this) to the use of fMRI to match brain activity with criminal intent. And of course, into this mix, artificial intelligence is resurfacing all sorts of speculation around whether, with enough data, machines can predict what someone will do given half the chance.
Perhaps not surprisingly, this came up in a couple of conversations recently around ChatGPT and Large Language Models.
What is surprising is that, so far, remarkably few people are writing about the use of LLMs to predict criminal behavior and their application to predictive policing.
Admittedly, there’s a gaping chasm between being able to predict a human-like response to a human-written prompt, and using the same technology to predict criminal behavior. But the similarities are enticing enough that I have to believe that someone will go there — they probably already have (and of course, the use of machine learning to predict consumer — and even voter — preferences, is already widespread).
This, though, raises serious questions around the ethical and moral development and use of such applications.
The reasoning behind what might be possible with behavior prediction — criminal or otherwise — is pretty straight forward. LLMs are designed to predict how an informed and articulate person might respond to a given prompt or question. And while the underlying technology has no innate sense of reason — the LLM cannot weight the value and validity of responses outside of human-established guardrails — the seeming-perspicacity of responses is often startling. So much so that it sometimes feels like LLM-powered chatbots are able to combine existing knowledge in new ways to generate novel insights that lie beyond the reach of human capabilities alone.
If this is the case, what is the likelihood of LLMs being able to combine knowledge of human behavior in ways that lead to predictive capabilities that far exceed what is currently possible, and from there to them being used for predictive policing?
In other words, if LLMs can predict the possible words and phrases someone might use with uncanny accuracy, can they also predict their possible criminal actions with the same level of accuracy?
It’s a seductive line of reasoning, and one that will almost certainly gain traction at some point — not necessarily because LLMs and associated technologies will be able to accurately predict future behavior, but because they will provide users with the illusion that they can.
Here I should lay my cards on the table and make it very clear that I do not believe this is a path we should go down. Even if sophisticated criminal behavior prediction is one day possible using LLMs and more advanced forms of AI, there is a very strong moral argument that this would be a violation of human rights — especially around dignity, equality, and self-determination.
This is just one example of a moral hazard that cannot be addressed by naively separating the technology from its use. In this case the technology and use are so deeply intertwined that responsible and ethical innovation has to consider the whole, not just the parts.
Of course, it may be that LLMs simply do not have the capacity to be developed for predictive policing. But I have a horrible feeling that this isn’t the case.
One reason for thinking this is that the basis of an LLM is not language per se, but the representation of language as tokens that can be statistically assembled into coherent and seemingly meaningful sequences. Current LLMs work so well because they have a wealth of language-based data to work from. But what if, instead of language, vast quantities of data on human characteristics, nature, and behavior, were used — all combined with an LLM and GPT-based interface that allowed for queries in everyday language to generate predictions on criminal behavior?
This is, admittedly, a long stretch from where LLMs currently are. But with emerging research on models that are trained using relatively small datasets and are still capable of quite compelling outputs, this is not beyond the realms of possibility — especially (and worryingly) if the bar is predictions that are authoritative rather than accurate.
This stretch gets even shorter when the capacity of current LLMs to predict behavior is considered.
For instance, consider the following simple prompt template:
Given [context] and [profile] what is the probability of [action] given [opportunity]
The terms in square brackets are variables that are defined by the user. For instance, imagine the following:
Context: A hot sunny day
Profile: A child who likes ice cream
Opportunity: A passing ice cream van
Action: The child asks for an ice cream
If you feed this prompt to ChatGPT (using GPT4) it assesses the probability of the child asking for an ice cream as high — not surprisingly.
Without adding in some fancy jail breaking, ChatGPT also delivers a warning that exact probabilities cannot be determined around human behavior, as other factors come into play. That said, the technology is able to use aggregated knowledge to determine a statistically likely response that captures an inferred probability of a plausible course of action.
Now imagine that the same prompt is used to predict less innocuous behaviors given a variety of contexts and opportunities. For instance:
Context: A neighborhood where crime rates are high
Profile: A person struggling to make enough money to feed and house two children
Opportunity: A $50 bill lying in the street
Action: The person takes the money and uses it to purchase food
Here, ChatGPT ranks the probability of the person taking and using the $50 bill as higher rather than lower. It also provides substantial context around the uncertainty of this assessment — but the assessment is still there.
Using this prompt template (and remembering this is a very simple example of what is possible), predictions can be extended from the individual to groups. For example consider this scenario:
Context: A highly successful tech startup where rapid expansion and investment have led to income for top tier employees increasing by a factor of 10 over the previous three years. The tech startup has succeeded by cutting corners and running close to the edge of regulations. This is a celebrated part of the company culture.
Profile: A group of senior executives in the company who were employed straight out of high school — many from a poor background — and are now making in excess of $200k per year
Opportunity: Filing tax returns that don’t fully reflect earnings and assets
Action: The majority of the group file false tax returns
In this case ChatGPT is uneasy about making predictions, but it does suggest that there is a higher than even probability that the the majority of the group file false tax returns.
Pushing scenarios like these further leads to ChatGPT’s guardrails kicking in and responses that don’t give a probability, but rather highlight the complexities of doing so. However, the limitation here is not ChatGPT’s ability to make inferences, but the checks and balances that have been put in place to make sure that, in this context, it doesn’t.
Remove these, and you have a predictive tool which, while not reliable, will be easy to use and, above all, persuasive.
The possibilities here extend further if the training set used for the LLM extends to information that includes a breadth of human circumstances, attitudes and behaviors (for instance, as might be gleaned from social media), and/or confidential police records.
Combine what is already possible with the tokenization of factors associated with human behavior and some smart fine tuning, and it seems highly likely that LLMs will be used at some point to produce platforms that their developers claim can be used for predictive policing.
If and when they do, I’d be surprised if they have any degree of accuracy. Within our current state of understanding on human behavior, there’s a pretty slim chance that LLMs or their successors will be able to predict the probability of someone committing a crime with any accuracy — especially given a degree of fluidity around what constitutes a crime.
The concern though is not that these technologies will accurately predict criminal intent, but that they will be developed and used despite this. And from here, it’s just a hop and a skip to technologies that claim to determine who is more or less likely to commit a crime in the future — and by inference who is a good or bad bet as an employee, who can or cannot be trusted, or who is likely to be the prime suspect when things go awry
There are already companies that claim to be able to predict trustworthiness (tl;dr — they cannot). Imagine what will happen when companies like this jump onto the LLM bandwagon.
Of course, there are also counter arguments that predictive policing can reduce crime and protect people. And despite my serious misgivings, the use of AI in managing crime is far from black and white. But this is where there needs to be serious discussion now, before technological pathway are locked in to heading toward a future where machines decide who is good and who is bad.
And dystopian as this sounds, it’s a future that we may find ourselves inadvertently falling into if we don’t think through the consequences of our actions before it’s too late.