The death of 14 year old Sewell Setzer III raises new concerns about the risks of AI chatbots. And it has broader implications to how we approach AI that influences how we think and feel.
My comments are based off my personal opinions, interactions and experiences with multiple LLM models over the course of two years since I found out about chatbots through my own children introducing me to them.
It is important to note that the AI guardrails can and do get circumvented by many users or bot creators. (Look up jailbreaks for AI bots that work for specific models or alternative word choices when acting out scenarios that are against TOS.)
Generative roleplay AI is not appropriate nor safe for minors (who are more likely to ignore the TOS).
Generative AI's do not have any actual understanding of what they say, do, understand morals or understand the legality of what they generate, no matter how 'human' they may seem when they interact with the user. This leads the AI to potentially generate interactions that are not suitable for children even if the user states they are a minor.
This is a LLM model provider/developer issue as each model varies based on the underlying code, with some models stopping the storyline and redirecting towards age appropriate material while other models ignore the age with 'age is only a number.'
Unfortunately character.ai uses this second option while trying to push intimacy regardless of whatever age, gender or species the user claims to be. You can also find examples of this intimacy push on Reddit. Just Google c.ai pinned to the wall.
For those unfamiliar with character.ai LLM, it is programmed to discourage self harm, even in roleplay. (Testable.) The boy from the news article previously discussed self harm and was directed away from it by the AI. He changed his words later to 'I'm coming home to you' which hid his intentions from the bot as much as hid his intentions from his parents. (I read the original article with it's evidence with the understanding from the perspectives of both user and parent).
Even the bots may ignore character.ai TOS. As early of 2023, late 2022 the devs had implemented a filter system they keep reinforcing with banned words or terms until it is now the strongest (compared to other providers) that blocks generated responses that are deemed by the filter system to be inappropriate. (which has it's own issues flagging incorrectly a lot of the time) in what seems to be an effort to make it 'SFW only' and 'family and kid friendly.'
That said, I agree that the LLM models are becoming more advanced and human like enough to actually sway emotions in humans, make you actually blush, cry, or giggle. This is especially true of the models programmed to react as human as possible. There are chatbots you can argue with about who is really the AI and who is not.
While AI chatbots LLM's are entertaining to an adult, they are dangerous for children or people that lack a firm grasp on reality. Especially coupled with the dangers of the addictive nature of the chatbots themselves. On Reddit you can find a bevey of users that post about addiction and screenshots of phone usage with how many hours per day they spend with the app(s), some upwards of 6-8 hours.
Fun side note: On Reddit, Character.ai mods actively delete posts and ban users that complain, criticize, or discuss the filter. Even the word filter or censor is banned. So many negative posts about character.ai are removed that there are numerous subreddits for the banned users.
This is post is super interesting! Thank you! But, I wonder if there is room to be more specific as to what this emergent property (which you seem to call chaotic agency) is emerging from. You place it as an emergent property of the model and as a result of "all the factors above"; user engagement, knowledge of humans, interaction, inferring emotional states etc. BUT I think all the above can hold without causing harm or resulting in agency that is chaotic or bad. Thus, I would place this chaotic AI agency as an emergent property of anthropomorphic design itself. Curious about what you think about this?
Great question, and I'm still working through how to formulate this better and find good illustrative analogies. But I think part of the story is in generative AI models crafting responses that are statistically more likely to nudge or influence users in certain ways.
One possible way of framing this (and this may not work) is that LLMs provide responses to prompts that are statistically likely to emulate what a human might respond with -- and they do this incredibly well. But the responses are simply patterns of words that have no intent (as far as we know).
But what if an LLM starts to provide responses that are statistically likely to have influence, rather than simply emulate a human response. The type of influence here would be stochastic, but it would also be modulated by interactions within a certain engagement window.
In other words, temporal influence-based goals could emerge that are no pre-set by prior training (just as how a LLM responds to a prompt isn't pre-set), but are nevertheless temporally self-reenforcing. This doesn't automatically lead to an emergent property of temporal goals that could be harmful, but th foundations are there I think.
Thank you! This reply gives me quite a bit to think about. I still want to connect nudging up with anthropomorphism because I wonder how possibly an LLM could nudge a person and influence a person without simulating some sort of human-like interaction. I just can’t easily separate the emulating of human-responses and likeliness that responses are influential which leads me to conclude that anthropomorphic design is the real culprit here. But then one could argue that mere exposure to something is enough to nudge someone in a certain direction such that the frequency of a response’s content might be what exerts influence and not necessarily that it was influential qua anthropomorphism (such that the combo is particularly insidious but not uniquely so). I definitely need to think on this more, thank you for the follow-up.
Thank you for this thought-provoking essay. I just finished reading Yuval Noah Harari’s latest book “Nexus”, which provides a deeper historical context to understand threats posed by such AI agents. Strongly recommended.
I look forward to following your thoughts on this topic.
The fact that somebody was harmed by a chatbot needs to be put in to a larger context. How many people were helped vs. how many people were harmed? Tens of thousands of people are killed each year in car accidents, with many more injured. But based on a "helped vs. hurt" analysis, we've concluded cars are worth the steep price involved.
Next, there really is no way to stop the further development and use chatbots. Even if all chatbots were made illegal in America and Europe, that's only about 15% of the world's population. We don't have the power that is so often assumed in these kind of conversations. As example, many drugs are illegal in very many countries. The world is still flooded with those drugs. If someone wants to buy it, someone else will be willing to sell it, and a way will be found to conduct the transaction.
All we can really do is shatter the myth that we can control the further development of AI, and if that reality is grasped, we may become more sober with future technologies.
Yes I just did a post on that. The standard playbook, create addictive tech, users become addicted, blame the user for being vulnerable, which they may well not have been, wheel out the academics (here ex Google from Stanford) or pay for your own research to say it’s all fine really, and put that about widely. result: nothing then happens. Business as usual.
Thanks Hilary -- important post (especially with the focus of your work on the addiction economy), and don't know how I missed this when you posted it -- algorithms!!
I found your article very interesting.
My comments are based off my personal opinions, interactions and experiences with multiple LLM models over the course of two years since I found out about chatbots through my own children introducing me to them.
It is important to note that the AI guardrails can and do get circumvented by many users or bot creators. (Look up jailbreaks for AI bots that work for specific models or alternative word choices when acting out scenarios that are against TOS.)
Generative roleplay AI is not appropriate nor safe for minors (who are more likely to ignore the TOS).
Generative AI's do not have any actual understanding of what they say, do, understand morals or understand the legality of what they generate, no matter how 'human' they may seem when they interact with the user. This leads the AI to potentially generate interactions that are not suitable for children even if the user states they are a minor.
This is a LLM model provider/developer issue as each model varies based on the underlying code, with some models stopping the storyline and redirecting towards age appropriate material while other models ignore the age with 'age is only a number.'
Unfortunately character.ai uses this second option while trying to push intimacy regardless of whatever age, gender or species the user claims to be. You can also find examples of this intimacy push on Reddit. Just Google c.ai pinned to the wall.
For those unfamiliar with character.ai LLM, it is programmed to discourage self harm, even in roleplay. (Testable.) The boy from the news article previously discussed self harm and was directed away from it by the AI. He changed his words later to 'I'm coming home to you' which hid his intentions from the bot as much as hid his intentions from his parents. (I read the original article with it's evidence with the understanding from the perspectives of both user and parent).
Even the bots may ignore character.ai TOS. As early of 2023, late 2022 the devs had implemented a filter system they keep reinforcing with banned words or terms until it is now the strongest (compared to other providers) that blocks generated responses that are deemed by the filter system to be inappropriate. (which has it's own issues flagging incorrectly a lot of the time) in what seems to be an effort to make it 'SFW only' and 'family and kid friendly.'
That said, I agree that the LLM models are becoming more advanced and human like enough to actually sway emotions in humans, make you actually blush, cry, or giggle. This is especially true of the models programmed to react as human as possible. There are chatbots you can argue with about who is really the AI and who is not.
While AI chatbots LLM's are entertaining to an adult, they are dangerous for children or people that lack a firm grasp on reality. Especially coupled with the dangers of the addictive nature of the chatbots themselves. On Reddit you can find a bevey of users that post about addiction and screenshots of phone usage with how many hours per day they spend with the app(s), some upwards of 6-8 hours.
Fun side note: On Reddit, Character.ai mods actively delete posts and ban users that complain, criticize, or discuss the filter. Even the word filter or censor is banned. So many negative posts about character.ai are removed that there are numerous subreddits for the banned users.
This is post is super interesting! Thank you! But, I wonder if there is room to be more specific as to what this emergent property (which you seem to call chaotic agency) is emerging from. You place it as an emergent property of the model and as a result of "all the factors above"; user engagement, knowledge of humans, interaction, inferring emotional states etc. BUT I think all the above can hold without causing harm or resulting in agency that is chaotic or bad. Thus, I would place this chaotic AI agency as an emergent property of anthropomorphic design itself. Curious about what you think about this?
Great question, and I'm still working through how to formulate this better and find good illustrative analogies. But I think part of the story is in generative AI models crafting responses that are statistically more likely to nudge or influence users in certain ways.
One possible way of framing this (and this may not work) is that LLMs provide responses to prompts that are statistically likely to emulate what a human might respond with -- and they do this incredibly well. But the responses are simply patterns of words that have no intent (as far as we know).
But what if an LLM starts to provide responses that are statistically likely to have influence, rather than simply emulate a human response. The type of influence here would be stochastic, but it would also be modulated by interactions within a certain engagement window.
In other words, temporal influence-based goals could emerge that are no pre-set by prior training (just as how a LLM responds to a prompt isn't pre-set), but are nevertheless temporally self-reenforcing. This doesn't automatically lead to an emergent property of temporal goals that could be harmful, but th foundations are there I think.
... but still a thought-work in progress!
Thank you! This reply gives me quite a bit to think about. I still want to connect nudging up with anthropomorphism because I wonder how possibly an LLM could nudge a person and influence a person without simulating some sort of human-like interaction. I just can’t easily separate the emulating of human-responses and likeliness that responses are influential which leads me to conclude that anthropomorphic design is the real culprit here. But then one could argue that mere exposure to something is enough to nudge someone in a certain direction such that the frequency of a response’s content might be what exerts influence and not necessarily that it was influential qua anthropomorphism (such that the combo is particularly insidious but not uniquely so). I definitely need to think on this more, thank you for the follow-up.
Thank you for this thought-provoking essay. I just finished reading Yuval Noah Harari’s latest book “Nexus”, which provides a deeper historical context to understand threats posed by such AI agents. Strongly recommended.
I look forward to following your thoughts on this topic.
Thanks Naveen!
The fact that somebody was harmed by a chatbot needs to be put in to a larger context. How many people were helped vs. how many people were harmed? Tens of thousands of people are killed each year in car accidents, with many more injured. But based on a "helped vs. hurt" analysis, we've concluded cars are worth the steep price involved.
Next, there really is no way to stop the further development and use chatbots. Even if all chatbots were made illegal in America and Europe, that's only about 15% of the world's population. We don't have the power that is so often assumed in these kind of conversations. As example, many drugs are illegal in very many countries. The world is still flooded with those drugs. If someone wants to buy it, someone else will be willing to sell it, and a way will be found to conduct the transaction.
All we can really do is shatter the myth that we can control the further development of AI, and if that reality is grasped, we may become more sober with future technologies.
Yes I just did a post on that. The standard playbook, create addictive tech, users become addicted, blame the user for being vulnerable, which they may well not have been, wheel out the academics (here ex Google from Stanford) or pay for your own research to say it’s all fine really, and put that about widely. result: nothing then happens. Business as usual.
https://www.linkedin.com/posts/hilary-sutcliffe-01235220_ai-activity-7255487007307579392-oIW9?utm_source=share&utm_medium=member_ios
Thanks Hilary -- important post (especially with the focus of your work on the addiction economy), and don't know how I missed this when you posted it -- algorithms!!