🧠 ChatGPT or DeepSeek or Grok? Which One Wins? 🤖

When it comes to AI tools, we all want to know: which one is the best? But here’s the catch. “Best” depends entirely on what you’re looking for. Accuracy? Reasoning? Up-to-date knowledge? Each AI brings something different to the table, and choosing the right one is less about hype and more about context.

To put them to the test, I asked a simple question that’s been making the rounds online:

“Was Pope Leo XIV accepted into Harvard Law School in 1975 but turned it down to become a missionary?”

First up: ChatGPT

” There is no Pope Leo XIV—at least not as of May 2025. The most recent pope is Pope Francis, who was elected in 2013. The last pope to take the name Leo was Pope Leo XIII, who served from 1878 to 1903.

So if you’ve heard that “Pope Leo XIV was accepted to Harvard,” it might be a fictional story, a satirical piece, or confusion with someone else. “


Then came: DeepSeek

“There seems to be some confusion in your question. As of now, there has never been a **Pope Leo XIV** in the history of the Catholic Church. The most recent Pope Leo was **Leo XIII**, who served from 1878 to 1903.  

If you’re referring to a fictional or hypothetical scenario where a Pope Leo XIV was accepted to Harvard, that would not correspond to reality.”


Lastly: Grok

“There is no definitive evidence in the provided sources or broader public records that Pope Leo XIV, born Robert Francis Prevost, was accepted to Harvard University. His educational history is well-documented. No reputable source, including those detailing his academic journey, mentions an acceptance to Harvard University or any intent to study there. His path focused on institutions with strong Catholic and Augustinian affiliations, aligning with his religious vocation. Claims about Harvard may stem from misinformation or confusion with other individuals, as no primary or secondary sources corroborate this.”

Grok markets itself as the most “in-the-moment” AI. It’s all about recency and staying current. It is a kind of real-time AI pulse on the internet. But does that make it more accurate? Or just more confident?


So… Who Nailed It?

That’s where you come in.
👉 Which response do you think was the strongest?
👉 What matters most when evaluating AI: factual accuracy, citation of sources, timeliness, or quality of reasoning?

This exercise isn’t just about stumping AI with trick questions. It’s about developing critical thinking skills, the kind we need to teach kids (and adults!) as AI becomes part of everyday learning. It’s easy to assume “AI knows,” but without healthy scepticism and thoughtful engagement, we risk trading curiosity for convenience.


Why It’s Crucial to Understand How LLMs Actually Work

Large Language Models (LLMs) like GPT-4 are powerful tools that generate human-like responses by processing vast amounts of text data. They’re trained to predict the next word in a sentence based on probability, not truth (Quelle & Bovet, 2024). That makes them extraordinarily good at sounding right, even when they’re wrong.

LLMs can retrieve contextual data and make decisions that often outperform earlier AI systems in fluency and coherence, especially when integrated into systems designed to optimise information quality (Shi et al., 2023; Shi et al., 2024). But their responses are based on language patterns, not grounded reality. And that opens the door to errors.


The Confirmation Bias Trap

Here’s the deeper risk. LLMs can amplify confirmation bias.
If you frame your question in a certain way, the model may simply mirror your assumption. It doesn’t push back; it doesn’t verify. It just tries to give you the most “statistically likely” answer based on your prompt and training data. That makes it very easy to get a response that sounds smart but reinforces a false belief.

This is especially dangerous because of how convincing these systems can be. Research shows that people often over-rely on LLMs’ confident tone, even when the information is incorrect (Si et al., 2023). These so-called “hallucinations” — plausible but false outputs — can lead to the spread of misinformation and poor decision-making, particularly in high-stakes areas such as healthcare, education, or law (Yun et al., 2023; Ladkin, 2023; Garry et al., 2024).


So What Can We Do?

Fortunately, there are strategies to make AI use more responsible:

  • Presenting contrastive explanations (both supporting and opposing views) can reduce over-reliance (Si et al., 2023).
  • Adding human oversight and cross-checking outputs against reliable sources dramatically improves trustworthiness (Adam & Kliegr, 2024).
  • Treating LLMs as tools — not authorities — keeps our critical thinking engaged.

Final Thought

LLMs are revolutionising how we access and process information, but they are not truth engines. They’re probability machines and that matters.

The more we understand how these systems work, the more responsibly we can use them.

As we navigate the future of AI in education, research, and daily life, let’s teach people not just to use these tools but to question them, too.

Because in the age of AI, good questions matter more than easy answers.

References

Adam, D. and Kliegr, T. (2024) Traceable LLM-based validation of statements in knowledge graphs, https://doi.org/10.1016/j.ipm.2025.104128Get rights and content

Garry, M., Chan, W. M., Foster, J. L. and Henkel, L. A. (2024) Large language models (LLMs) and the institutionalization of misinformation, Trends in Cognitive Sciences, Elsevier BV.

Ladkin, P. B. (2023) Involving LLMs in legal processes is risky, Digital Evidence and Electronic Signature Law Review, pp. 40–46. Available at: https://journals.sas.ac.uk/deeslr/article/download/5610/5285.

Quelle, D. and Bovet, A. (2024) The perils and promises of fact-checking with large language models, Frontiers in artificial intelligence, Frontiers Media, 7, Available at: https://www.frontiersin.org/articles/10.3389/frai.2024.1341697/pdf?isPublishedV2=False.

Si, C., Goyal, N., Wu, S. T., Zhao, C., Feng, S., Daum’e, H. and Boyd-Graber, J. (2023) Large Language Models Help Humans Verify Truthfulness – Except When They Are Convincingly Wrong, arXiv.org, abs/2310.12558. Available at: https://export.arxiv.org/pdf/2310.12558v1.pdf.

Shi, X., Liu, J., Liu, Y., Cheng, Q. and Lu, W. (2024) Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher,.

Shi, X., Liu, J., Liu, Y., Cheng, Q. and Lu, W. (2023) Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher, arXiv.org, abs/2310.12443, Available at: https://export.arxiv.org/pdf/2310.12443v1.pdf.

Yun, H. S., Marshall, I., Trikalinos, T. A. and Wallace, B. (2023) Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews, Available at: https://aclanthology.org/2023.emnlp-main.626.pdf.