So glad others are playing around with Deep Research to see what it can do. It's interesting you don't think we are headed for a future where "AI can independently research and write a PhD dissertation" - why not? We are just 2 years removed from the release of ChatGPT and billions (soon to be trillions) of dollars are being poured into AI to reach the ultimate goal of AGI in an existential arms race with China. I would not bet against it right now. But thanks for this post. I teach an independent research class for HS students and these models have me rethinking what research will look like for them going forward.
I'm consistently surprised about how "looks a bit like it almost works" translates into "basically don't benefit from using it" with AI. Here is an example: I need my references in a particular format, and sometimes websites give them to me in a different format, so I tried using AI to reformat them. Most of the time it is great, but sometimes it changes the first name or journal issue or something, so I have to check each one carefully. But if I have to check each one carefully it is quicker to reformat myself.
Interesting experiment but did you do any proofing of the outcome? My experience with AI writing experiments is that what’s returned is filled with made-up quotes and paraphrasing that often has nothing to do with the sources used. You need to go in and check to see if quotes are real and authors actually say what the AI said they did. Often it’s all lies and if you push the AI and say that the results are lies you get a whole new set of lies in response.
Enjoyed the paper. My thought: The real question is not whether we can "survive" the polycrisis, but whether we are prepared to abandon the epistemic constraints that make crisis feel like an existential threat rather than an evolutionary imperative.
Genius exercise! Curious how this can apply to a screenplay. I’ve been testing it but appreciate your insights on providing documents for the different sections. Would love to see more case studies. Thanks for the experiment and for sharing the process. Ahhhh the things we do thanks to aha moments in the shower 🤣🙏🏻💫
Interesting post. Not sure if I have the time to read a 400-pages about surviving the polycrisis generated by AI, maybe I would read a good book about it instead. But maybe I'm biased.
Anyways, the output is an essay in the field of soft sciences: sociology, political science and maybe some economics. Which may be fields of science where AI could help or really revolutionize how you do research.
Try writing a decent STEM PhD dissertation which is actually original. Good luck with that.
I want to suggest that a business should make a phd “wrapper” for deep research that walks a student through every step to use deep research at its best. However, the pace of ai development is so rapid that by the time this has been created, published and advertised then a much more capable AI might already have been released that only needs a single prompt “help me structure my phd paper” (or something) and all the effort in making a wrapper is worthless … things are almost moving too fast to be useful!
Same! I come up with genius wrapper needs daily but by the time you create something, the tech has evolved. It’s exponential. Gotta think what’s needed 6 months from now. Like trend forecasting!
Andrew, your brilliance and generosity in sharing your experiments and insights in almost real time make yours not just a highly instructive but also entertaining substack. Thank you for that!
A question about the three foundational supporting documents you provided Deep Research with along with each preceding chapter, as well as the detailed descriptions/prompts about what a particular chapter was about in a given SUB TASK: Were those supporting documents (including the TOC) and the Sub Task prompts for each chapter crafted entirely by you or did Chat/Deep Research assist here, too?
It's not too clear in the article, but the first file linked to above is the primary prompt. This led to the three foundational files (which I separated out and cleaned up a but before using them)
Perplexity's Deep Research hadn't been released when I wrote this. I haven't tried it yet, but from what I'm hearing it's good, but not as good as OpenAI's Deep Research. That said, I think we're going to see these research/reasoning models advancing quite dramatically over the next few months!
"Impressive but flawed" would seem to sum up LLM developments in the past two years. The impressiveness has increased over time, and the promise (the hype) has grown, and yet the flaws remain blockers to real-world impact. A 5% bullshit rate accrues risk in any application that will blow up inevitably, and maybe sink the enterprise applying it
I always love your experiments, Andrew. I really want to try this on works that I know better. I find that reading stuff like this is so difficult to parse when you both don't know what is accurate and don't know the field well. At least with peer-reviewed works I can (hopefully..) assume that someone smarter and more knowledgeable than me read through the draft and picked it apart.
With AI - there is such a keen semblance of authority I find it hard to know when it is full of... well, when it is hallucinating.
I don't think we are close to these systems replacing experts - but I also wonder if those without expertise can spot the uncanny or fakeness of the text if they DONT know it is AI?
For folks reviewing it, will field expertise be enough? Will they need the AI expertise to spot the "tells"? How long will those tells last before a future iteration allows them to be removed?
Experts might be able to navigate these - but I worry most for entry level folks, students, and the public... many already have a hard time parsing through it all. Now a semi-realistic dissertation can be generated that is somewhat passable. As usual with your experiments - both excited to try it out and a bit terrified by the possibilities.
Also - love the Shower Thought origin of this project!
Thanks Leonard -- and I agree, trust and evaluation are incredibly important and getting increasingly challenging here! A substantial concern is that the investment needed in checking everything -- even for an expert -- when the prose are so honey-smooth, is so large that eventually most people will cave and just trust what they read. I'd like to say that a critical skill everyone needs is to ask questions about what they are reading/hearing and use critical thinking to assess the plausibility and validity of stuff -- but this is hard. Maybe everyone needs their own personal AI critical thinking advisor ...
The one thing that is clear is that the genie isn't going back in the bottle
I tried it out this week, and it was better than expected. I'd estimate it'd take the content development process from months to weeks, for a full and detailed manuscript.
But does need detailed and specific prompting - my initial requests did not produce what I wanted, but a bit more time on the prompts, following your examples, got way better results
So glad others are playing around with Deep Research to see what it can do. It's interesting you don't think we are headed for a future where "AI can independently research and write a PhD dissertation" - why not? We are just 2 years removed from the release of ChatGPT and billions (soon to be trillions) of dollars are being poured into AI to reach the ultimate goal of AGI in an existential arms race with China. I would not bet against it right now. But thanks for this post. I teach an independent research class for HS students and these models have me rethinking what research will look like for them going forward.
https://fitzyhistory.substack.com/p/the-20-page-research-paper-in-20
"I don’t think we are heading for a future where AI can independently research and write a PhD dissertation"
I don't see why not... AI is obviously on a path towards making human cognition in general redundant. "The future doesn't need us."
Quite fascinating. Thanks for sharing
I'm consistently surprised about how "looks a bit like it almost works" translates into "basically don't benefit from using it" with AI. Here is an example: I need my references in a particular format, and sometimes websites give them to me in a different format, so I tried using AI to reformat them. Most of the time it is great, but sometimes it changes the first name or journal issue or something, so I have to check each one carefully. But if I have to check each one carefully it is quicker to reformat myself.
Interesting experiment but did you do any proofing of the outcome? My experience with AI writing experiments is that what’s returned is filled with made-up quotes and paraphrasing that often has nothing to do with the sources used. You need to go in and check to see if quotes are real and authors actually say what the AI said they did. Often it’s all lies and if you push the AI and say that the results are lies you get a whole new set of lies in response.
Enjoyed the paper. My thought: The real question is not whether we can "survive" the polycrisis, but whether we are prepared to abandon the epistemic constraints that make crisis feel like an existential threat rather than an evolutionary imperative.
Genius exercise! Curious how this can apply to a screenplay. I’ve been testing it but appreciate your insights on providing documents for the different sections. Would love to see more case studies. Thanks for the experiment and for sharing the process. Ahhhh the things we do thanks to aha moments in the shower 🤣🙏🏻💫
Interesting post. Not sure if I have the time to read a 400-pages about surviving the polycrisis generated by AI, maybe I would read a good book about it instead. But maybe I'm biased.
Anyways, the output is an essay in the field of soft sciences: sociology, political science and maybe some economics. Which may be fields of science where AI could help or really revolutionize how you do research.
Try writing a decent STEM PhD dissertation which is actually original. Good luck with that.
I want to suggest that a business should make a phd “wrapper” for deep research that walks a student through every step to use deep research at its best. However, the pace of ai development is so rapid that by the time this has been created, published and advertised then a much more capable AI might already have been released that only needs a single prompt “help me structure my phd paper” (or something) and all the effort in making a wrapper is worthless … things are almost moving too fast to be useful!
Same! I come up with genius wrapper needs daily but by the time you create something, the tech has evolved. It’s exponential. Gotta think what’s needed 6 months from now. Like trend forecasting!
Andrew, your brilliance and generosity in sharing your experiments and insights in almost real time make yours not just a highly instructive but also entertaining substack. Thank you for that!
A question about the three foundational supporting documents you provided Deep Research with along with each preceding chapter, as well as the detailed descriptions/prompts about what a particular chapter was about in a given SUB TASK: Were those supporting documents (including the TOC) and the Sub Task prompts for each chapter crafted entirely by you or did Chat/Deep Research assist here, too?
They were all worked up by ChatGPT with an initial simple prompt
Fascinating! Did you share what that simple prompt was? I might’ve missed it.
Great, thanks!
It's not too clear in the article, but the first file linked to above is the primary prompt. This led to the three foundational files (which I separated out and cleaned up a but before using them)
did you look into perplexity deep research - the reasoning I used for a personal project is marvelous on a graduate level
Perplexity's Deep Research hadn't been released when I wrote this. I haven't tried it yet, but from what I'm hearing it's good, but not as good as OpenAI's Deep Research. That said, I think we're going to see these research/reasoning models advancing quite dramatically over the next few months!
This was so thorough and interesting! Thanks for sharing :)
Thanks Jeff!
And if AI does all the PHDs - where will new professors come from?
If we allow this to continue unchecked it will degrade entire ecologies of human knowledge .
We won't have super inteliigent machines - just considerably less intelligent humans.
"Impressive but flawed" would seem to sum up LLM developments in the past two years. The impressiveness has increased over time, and the promise (the hype) has grown, and yet the flaws remain blockers to real-world impact. A 5% bullshit rate accrues risk in any application that will blow up inevitably, and maybe sink the enterprise applying it
I always love your experiments, Andrew. I really want to try this on works that I know better. I find that reading stuff like this is so difficult to parse when you both don't know what is accurate and don't know the field well. At least with peer-reviewed works I can (hopefully..) assume that someone smarter and more knowledgeable than me read through the draft and picked it apart.
With AI - there is such a keen semblance of authority I find it hard to know when it is full of... well, when it is hallucinating.
I don't think we are close to these systems replacing experts - but I also wonder if those without expertise can spot the uncanny or fakeness of the text if they DONT know it is AI?
For folks reviewing it, will field expertise be enough? Will they need the AI expertise to spot the "tells"? How long will those tells last before a future iteration allows them to be removed?
Experts might be able to navigate these - but I worry most for entry level folks, students, and the public... many already have a hard time parsing through it all. Now a semi-realistic dissertation can be generated that is somewhat passable. As usual with your experiments - both excited to try it out and a bit terrified by the possibilities.
Also - love the Shower Thought origin of this project!
Thanks Leonard -- and I agree, trust and evaluation are incredibly important and getting increasingly challenging here! A substantial concern is that the investment needed in checking everything -- even for an expert -- when the prose are so honey-smooth, is so large that eventually most people will cave and just trust what they read. I'd like to say that a critical skill everyone needs is to ask questions about what they are reading/hearing and use critical thinking to assess the plausibility and validity of stuff -- but this is hard. Maybe everyone needs their own personal AI critical thinking advisor ...
The one thing that is clear is that the genie isn't going back in the bottle
Hi Andrew - really cool experiment. I'm very keen to try out Deep Research. Guess I just need to bite the bullet and subscribe to Pro ;)
I'm an ex-textbook publisher. How well do you think it might research and then write content for textbooks, say high-school physics?
Better than you would imagine -- but still needs a human eye and human editing :)
I tried it out this week, and it was better than expected. I'd estimate it'd take the content development process from months to weeks, for a full and detailed manuscript.
But does need detailed and specific prompting - my initial requests did not produce what I wanted, but a bit more time on the prompts, following your examples, got way better results