Deep Research is the newest AI application of “reasoning models”. The goal is to leverage reasoning-focused large language models to generate comprehensive, detailed reports on specific topics while searching the web. Unlike typical chatbot interactions, Deep Research takes more time to respond (up to 10 minutes) because it actively searches and analyzes multiple sources. This thorough approach enables the AI to compare different perspectives and synthesize accurate, up-to-date information instead of relying solely on pre-trained knowledge.
As the AI model explores more information and sources, it modifies its research plan and moves to a next set of sources spending time between searching and reading to finally synthesize an output.
Key Features:
Generate long form narrative reports
Web Search - Ability to search the web and find sources to compile said report
Analysis - Use tools like python to analyse information found in sources
Citations - Attribute answers to sources through linked citations
Now you know that you have PhD-level intelligence on tap and you want to utilize it. How do you choose between models? The AI companies provide you with the following chart, which, to be honest, is pretty hard to grok.
Where the Rubber meets the road - testing out different Deep Research products
Decisional has been working on this space for the last few months. I am extremely interested in being able to effectively deploy AI for completing my own knowledge work.
So I decided to test out the following Deep Research on a topic I am reasonably familiar with i.e Internal Combustion Engines (I was a motorhead and wanted to pick an old school topic).
Here are the current Deep Research options we have available that I checked out:
OpenAI Deep Research
Google Gemini Deep Research
Perplexity R1 Deep Research
Grok 3 Deep Research
Just like building an Eval, I started to think of things that would be super important for AI to cover while thinking from first principles BEFORE looking at the results and after speaking to a few friends I came up with the following:
Fundamentals
How does an IC Engine work?
What are the components?
What are the different types of IC Engines?
How do you measure the power of an engine?
Comprehensiveness
How much detail does the report go into ?
Explanation of straight six vs W series of engines
How vibrations impact the shape of the engine
Visualisation
Diagrams
Compression ratios
Weight to power
Efficiency curves
Ease of Reading
Is it interesting to read?
Does it have stories that may make someone interested to explore the topic further?
Logical flow
Is the information organized in a way that makes sense
Does the index and table of contents make sense
Can someone easily navigate to the section they are interested in
Quality of Sources
Are you picking up something from a blog or high trust sources
How many sources are you using in order to come up with information
The Test
Open AI’s Research Report
Gemini Deep Research
The Results
Insights
Models are able to capture fundamentals and make the report logically coherent but are still lacking in visualization and comprehensiveness
🟢 'Fundamentals' and 'Logical Flow' peaked at a perfect score for OAI, highlighting strong foundation and coherence across presentations.
🔔 'Visualization' recorded no variation, consistently scoring 2. This indicates a potential need for improvement across all organizations.
⚠️ 'Ease of Reading' showed the most variability, suggesting inconsistent clarity and accessibility in presentation styles.
Prompt used to evaluate all four DR reports: "Give me a detailed report on how an IC Engine works"
OAI - 9000 Words, 27 Sources
PPLX - 964 Words, 23 Sources
Gemini - 2000 Words, 122 Sources
Grok 3 - 2000 words, 16 Sources
The Winner
And so the current winner is Open AI, but with clear areas for improvement especially on the ability to visualize information. The variability among the other models is not too high either so you may not be able to see much of a difference. Gemini is probably the best for covering a large breadth of the internet and Open AI the best in focused research while Grok seems to be the best in ease of reading and perplexity in speed of generating an answer.
Nerdy Observations
Visualisation
Lack of ability to visualize or make diagrams
Even PPLX, which is generally good at pulling up stock market charts, didn't translate this into visual representations
OAI, despite excellent performance in most areas, didn't leverage diagrams to enhance understanding
Comprehensiveness
None adequately covered specific engine configurations like straight-six vs. W-series engines
All papers provided minimal exploration of how vibrations impact engine design
Even OAI, with its otherwise comprehensive coverage, scored only a 4/5 on comprehensiveness due to this gap
This uniform weakness in very specific technical areas suggests these systems may share similar training limitations with respect to specialized engineering knowledge
GROK's Unexpected Accessibility Advantage
GROK demonstrates an unexpected strength in communication approach
It begins with a "Key Points" summary section not found in other papers
It scores equal to OAI on readability (4/5) despite providing less sophisticated technical content
Its overall score (3.3) exceeds GEMINI (3.2) primarily due to this communication advantage
This suggests GROK might be particularly optimized for accessibility and clear communication compared to systems often considered more technically advanced (that makes sense based on the training data being using X which is optimized for short clear communication).
The Future of Deep Research
Deep research is a great stepping stone towards helping AI apply knowledge work but it is lacking in three ways:
Control - These are still wall of texts and you cannot control or modify the text
Transparency - Showing you why a certain part of the report was compiled in a certain way
Visuals - While this may not be obvious but business research often needs to be aesthetically pleasing (cue MBB consultants and their presentation formatting skills) whereas academic research generally does not
References
https://openai.com/index/introducing-deep-research/
https://www.sequoiacap.com/podcast/training-data-deep-research/
https://marginalrevolution.com/marginalrevolution/2025/02/deep-research.html
https://www.datacamp.com/blog/deep-research-openai