To follow on yesterday’s post on AI-produced research, here is a reflection on “vibe researching” from Prof. Joshua Gans of the University of Toronto’s Rotman School of Management. Since the release of the first “reasoning” models in late 2024, he has gone all in on experimenting with AI-first research.
One of the key takeaways is that he found himself pursuing low quality ideas to completion more often, precisely because the cost of choosing to continue to pursue a questionable idea has been lowered. Sycophancy is a problem, too. With an AI cheerleader, it is easy to convince yourself you have a result when you do not.
Those ideas were all fine but not high quality, and what is worse, I didn’t realise that they weren’t that significant until external referees said so. I didn’t realise it because they were reasonably hard to do, and I was happy to have solved them.
I will note that (human) peer reviewers cannot be the levee that stops the flood of middling AI research: the system of uncompensated labour that undergirds all of academic publishing is already strained to bursting, as every editor desperate to find referees for a paper will tell you.
Prof. Gans concludes his year-long experiment in “vibe researching” was a failure, despite publishing many working papers and publishing a handful of them:
My point is that the experiment — can we do research at high speed without much human input — was a failure.
He emphasizes that human taste and judgement will be more important than ever to decide which questions are worth pursuing as the cost of doing research falls:
Going forward, I will continue to be AI-first in my research, now with guardrails to ensure the human element is retained. That means self-generated pauses, more peer feedback through seminars and discussions and more decision points to ask whether what I am doing is really worth doing.
A final note: Prof. Gans mentions pitting models against each other to improve the outputs of AI-generated research. He also recommends a service called Refine that claims to run many agents in parallel to produce a comprehensive report with suggestions to improve a paper’s clarity and accuracy (some of the examples include fairly technical papers). The service is pretty pricey, though ($50 USD for a single review or a $40 USD/month subscription for one review/month)!