Five AI-generated podcast episodes that'll make you think
Google's personal AI assistant NotebookLM can now generate full-on AI podcasts. The results are amazing. But they're also concerning.
A couple of weeks ago, Google added a feature to their NotebookLM AI assistant that creates a “deep dive” AI generated podcast from material you’ve uploaded.
If you’re read some of the coverage already on this feature, you’ll know just how jaw dropping it is. But beyond the initial reactions, just how good is Google’s podcast generator?
Listening last week to what Western University’s Chief AI Officer Mark Daley got from feeding NotebookLM the top posts from his Substack Noetic engines, I had to try this for myself. Like Mark, the results significantly exceeded my expectations. But they also raised some red flags.
NotebookLM launched last December as part of Google’s growing suite of online tools. It’s a “personal AI assistant” that’s powered by Googles Gemini 1.5 large language model, and is designed to help users dive into, understand, and synthesize uploaded material from documents to websites.
The platform’s pretty good at helping navigate complex documents or large amounts of material (with the caveat that it can’t fully be trusted). But up until recently it was just text based.
This all changed with the introduction of an “audio overview” feature on September 11.
This feature translates your uploaded documents, slides, websites etc. into an AI-generated podcast, complete with two personable AI hosts.
The aim is to provide highly accessible and compelling insights into the material you provided in a 5 - 10 minute audio chunk.
But — and here’s the kicker — the hosts are so realistic, and the banter between them so lifelike, that you could be forgiven for thinking this was two real people having a real conversation.
Eager to see just how good this AI Podcast generator is, dived into it with the following five examples. Together they give a pretty good sense of what’s impressive with this generative AI tool. But they also highlight where it is still lacking — and where it raises concerns.
1. Podcasting the Future of Being Human
For this first example I fed NotebookLM nineteen of the top articles on the Future of Being Human Substack, along with a link to the “about” web page. The result was one of the longer podcasts I got NotebookLM to produce at over 14 minutes:
The result is, admittedly, impressive. As you start playing the file it sounds like two very human podcast hosts doing a deep dive into the Substack.
The conversation feels authentically human. But beyond the initial wow factor, it also reflects a fundamental lack of understanding of the material supplied and how it all fits together — as you would expect from a generative AI that has no intrinsic understanding of the world.
What you hear as the two synthetic hosts engage with the material and each other is a valiant attempt to weave a coherent narrative from nineteen articles that cover a broad range of topics and ideas. And at times NotebookLM struggles.
It also makes stuff up.
A really intriguing aspect of the tool is that it tries to provide context based on its interpretation of the material provided, and adds what it thinks are helpful connections and metaphors.
In other words, it adds to, riffs off, and editorializes around, what you give it.
When this works well, it’s impressive. But sometimes it’s completely off the mark. And this raises concerns around just how trustworthy these audio summaries are.
This is all the more worrying as they sound trustworthy!
I wondered whether the mis-framings and misinterpretations I was hearing were simply due to feeding NotebookLM too much information across too many areas. And so I tried narrowing things down:
2. Riffing on Neuralink
As I was talking at a symposium last week about Neuralink and Brain Computer Interfaces (BCIs), I though I’d see what NotebookLM could pull out of my various Substack articles on the topic.
Hoping that I’d get a more trustworthy and informed podcast if I kept the focus narrow, I gave it eight Future of Being Human articles related to Neuralink:
This one was intriguing. There are inaccuracies here, as well as extrapolations that I would be uneasy making myself. But there are also connections and insights that I found interesting.
NotebookLM used the supplied text as a starting point for the podcast. But it went beyond this as it riffed off the material for its synthetic podcast storytelling. Yet it was still constrained by the boundaries set by the supplied articles.
For instance, the podcast doesn’t even try to place Neuralink’s work on BCIs within a historical context. And it doesn’t bring in perspectives associated with a growing number of other companies working in the same area. These are two things I would expect a pair of human podcasts hosts to do.
Instead, as it extemporizes around the material, it becomes very apparent that it doesn’t know what it doesn’t know.
Despite this, I found the podcast quite generative — in part because I can filter out the suspect stuff and focus on what is useful. But I’m not sure though what someone who was new to the field would take away from it though.
3. Why The Future Doesn’t Need Us
For my third example I wanted to see how NotebookLM dealt with a single source rather than a collection of sources — would this reduce the degree to which it got creative beyond the supplied material?
To test this I used the 2000 Wired Magazine essay Why The Future Doesn’t Need Us by Sun Microsystem founder Bill Joy:
It’s a while since I’ve read Joy’s essay and listening to the AI podcast I found myself wondering just how accurate it was, and especially whether it was accurately summarizing Joy’s piece, or simply riffing off it ways.
I’d be interested in what others think, but I had a sense here that NotebookLM was taking a handful of themes and connections out of Joy’s piece, and weaving a story around them that draws on a lot more than is there in the original.
It’s like the article is a catalytic spark that generates a whole new set of ideas and perspectives for the AI podcast hosts.
On one hand this is intriguing as the AI does go beyond the material it’s given. But it raises huge question marks around how trustworthy the audio overview is as it makes editorial decisions and riffs off ideas.
At this point I was beginning to worry that Google’s AI was not only being rather fast and loose with the material it had to work with, but was doing so in a way that made it extremely easy to believe — with seductively engaging voices burrowing their way into my brain as they bypassed my critical thinking!
This raises rather a lot of red flags when factual, nuanced and contextualized analysis are needed. But how about something where a more subjective approach might be helpful?
4. Unpacking ASU’s School for the Future of Innovation in Society
To explore this, I chose my home academic institution — Arizona State University’s School for the Future of Innovation in Society, or SFIS — for my fourth example .
I provided the AI with three sources: Information on the school scraped from the SFIS website; a list of faculty profiles (constructed using the AI platform Perplexity); and the school’s mission brochure which was published a few years ago:
There are still some factually questionable bits here (and let’s be honest, some made up bits!). But working with this material NotebookLM does a great job of capturing the essence of the school.
I actually found this inspiring. It captured why I’m a professor in SFIS, and there are some great call-outs to my fantastic colleagues Darlene Cavalier, Darshan Karwat and Christy Spackman! (These were all selected unprompted by NotebookAL from the material provided)
But the extemporized hallucinations do still worry me.
What about situations where creativity is more important than accuracy though?
For the final exploration here I decided to go all-in generative AI and ask ChatGPT to write a novel that I then asked NotebookLM to discuss.
5. The Memory Capsule: An AI Novel
I’ve been re-reading the British science fiction author John Wyndham’s novels over the summer, and thought it would be intriguing to ask ChatGPT (using ChatGPT 4o) to synthesize a new novel in the style of Wyndham — and then get NotebookLM to “record” a podcast on it.
I’ll probably write about the process of working with ChatGPT to write the novel in a later article as it was surprisingly revealing. But for this piece I just wanted a piece of new fiction that I could feed to NotebookLM.
The resulting ~20,000 word story was titled The Memory Capsule and the themes, characters, and plot all came from ChatGPT.
I didn’t even read it before listening to the podcast, as I was intrigued to see how this AI-feeding-AI process would work out. And this was the result:
Putting aside the embarrassment of ChatGPT using my first name for the main character, the result is superficially impressive, like much of NotebookLM’s output. But it’s also a little hollow — and I must confess that, by this point, I was getting a little tired of the repetitive cadences and interjections of my two AI podcast hosts.
Listening to the podcast, you get the impression that The Memory Capsule is a somewhat derivative novel — although to be fair that’s what I asked ChatGPT to do. But the podcast itself is also a little cliche’d, riffing off pretty well established tropes around power and responsibility when it comes to advanced technologies.
To be fair, human-made derivative novels and cliche’d podcasts are also a dozen a dime. And so maybe generative AI is simply doing a good job of emulating human mediocrity here.
And I must confess that, for all my eye rolls, I did end up being drawn in by the conversation and wanting to read the novel.
If you’re interested in seeing how the story lives up to the podcast, you can download it using the link below. When I did get round to reading it I must confess that I found it rather linear, and the sort of thing you might expect a 12 year old with a 30 year old’s grasp of language and a 10 year old’s experience of life to write. But for an AI, it’s not bad — and the whole idea of a couple of AI podcast hosts interpreting and riffing off an AI-generated novel does intrigue me!
Where does all this leave us?
Listening to these five examples of NotebookLM’s audio overviews, I find myself impressed by how human the podcasts feel. The cadences, banter, interjections, use of metaphor, and story telling, are engaging and compelling — all aided by realistic hesitations, exclamations, laughs, and pauses for breath —along with the audible sound of breathing between the spoken words.
But this is also part of what worries me about the direction this particular flavor of generative AI is taking.
By hyper-humanizing the content, it’s near impossible to resist listening to these podcasts as if they are two real people talking. And despite the many AI tells that are present — the repeated phrases, the predictable responses, and the occasional fluffs — it’s hard not to feel that, as you listen, you have a connection connection with the hosts.
In other words, the AI is doing an awful lot of anthropomorphic heavy lifting in these audio overviews.
This means that they are easy to trust it and hard to challenge — even though the information being conveyed is not always trustworthy. And this is only enhanced by the very intentional storytelling within each podcast that is designed to resonate deeply with you as the listener.
This wouldn’t be so bad if the content stayed true to the source material. But to keep the engagement and storytelling up, the AI hosts use the material they’re given as a rather loose starting point, riffing off it with insights and inferences from other sources — and not always accurate ones — together with their own synthetic opinions and perspectives.
The result is compelling AI generated stories that are hard not to trust, and yet are not trustworthy.
And yet, if I am being honest, the AI hosts aren’t that different from many human voices to be found in podcast space and on social media, where an engaging format, compelling storytelling and a superficial and somewhat cavalier attitude toward the truth, leads to similar content.
In this respect, NotebookLM is scarily good at emulating very human online content which is equally untrustworthy.
Where the technology goes from here is unclear. The ability of generative AI to emulate two engaging humans bantering together as they riff off stuff is already impressive — and I suspect that we’ll see rapid improvements in the fidelity of this to the point where it gets harder to tell what’s real and what’s not.
This could be positively transformative if developed and used responsibly. At the same time, as we teach machines to tell us stories about the world in ways that resonate so deeply with our evolved brains that they’re hard to resist, who or what will end up creating the stories that determine our beliefs, guide our actions, and ultimately govern our futures?
Maybe that’s just me being paranoid though — I’d be interested in your opinions in the comments below!
Truly thought-provoking.
I understand the very clear concern around AI that in audio format, feels much more real and human, and therefore more persuasive, than it does in things like visual media. It's also harder to tell that this is synthetic than with visual media. However, I can't help but get stopped on the idea that there seems to be no purpose of this technology other than to create misleading content that is, as far as I can tell, very clearly trained off of the work of popular podcasts. I noticed a distinct similarity between the speaking style and word choice for these episodes as with podcasts that I listen to regularly.
While the ability to create audio content that is convincing is spooky, especially as a goal, because I can think of no more likely reason for this technology's development, I also wonder if this sort of tech is making a statement in the discussion of AI regulation and creative work. If you'll allow me to take out my brush that blurs the line between analysis and conspiracy theory, keep that in mind that it's what I am painting with. One important argument in this discussion is that artists are not directly harmed by AI, and that, more importantly, that this is a reflection of the idea that art is somehow a luxury career, or less legitimate and worthy of protection than less creative options. Many believe, explicitly or implicitly, that the value of art is dependent upon the level of proficiency of the artist, and that the level of proficiency is determined based upon the level of difficulty that someone else would have in replicating that art. Taking a look at the history of modern art proves this to be false in a variety of ways, but that's something I don't want to elaborate on here.
The point being that, even if you look at other comments, there's not exactly people jumping to defend podcasting as an art form relative to, say, painting. It's a joke, one that many people have made, me included, that podcasts are so easy to create, that there's so many of them, that everyone you know makes one, and listens to an astronomically small number of them. It's funny to say that a system that generates an incomprehensible amount of content that frequently makes factual errors, spends the majority of its run-time saying nothing of consequence, and, most importantly, feels more authentically human, is no different than ordinary podcasting. This is because podcasting makes an art form out of something that theoretically requires less proficiency than other art forms, when there's likely even more text or visual art out there. This is what makes the idea that podcasting is or can be art questionable for most. The idea that someone would have a career in it is even more bothersome. The high visibility of podcasting on all levels of skill makes it easy to ignore that there are podcasts that are important components of peoples' careers and livelihoods. That is to say very little about the consequences of seeing art as an economic commodity, or about the critical misunderstandings surrounding the "point" of creative work.
Whether intended or not, moving the discussion around AI art as theft to a medium that many do not consider art acts to remove legitimacy from the discussion around whether such theft is justified.
To comment on how human this feels, whether I would have known ahead of time that this is AI, it still would have bothered me. It speaks with all the conventions of a structurally crafted podcast, but with none of the correct placements of these changes in pacing. The only thing that matters is that it is convincingly human for some, or may be convincingly human enough for others in the future. It doesn't need to fool people. It only needs to make them feel as though the speakers are human. This would make a more effective, more interesting and entertaining, and more humanized synthetic podcast more risky, not less. That is to say, the issue here is not about the low quality of information and entertainment being produced rapidly at high volume, but that this would be compelling information and entertainment produced rapidly and at high volume. It would be a technology that is inauthentic in a medium prized for its appearance of its authenticity. The reason this is concerning is because such audio has more in common with a deep fake than with LLM text or visual art, in terms of its ability to persuade or manipulate people, with less visible indications of such manipulation than deep fake videos.