Business Tech Talks powered by BlueSoft Generative AI 29 minutes

ChatGPT, Claude, or Gemini? How to Choose the Right Model

Listen to the episode at:

In today’s episode of the Business Tech Talks powered by BlueSoft podcast, we are joined by Rafał Bielicki, a Solutions Architect at BlueSoft and the author of the pioneering “MortAI Kombat” project. We discuss the results of a comprehensive comparison of five leading AI models (ChatGPT, Claude, Gemini, Copilot, and Meta AI), analyzing their performance across 47 tasks in 10 categories, including mathematics, code debugging, and business strategy development. We focus on why there is no single universal leader in artificial intelligence, the surprising mistakes models make in image analysis, and how organizations should approach choosing a specific AI “engine” depending on the nature of the problem. Below is a detailed summary of the episode transcript.

The Origins of the Project and Research Methodology

Rafał Biliński noticed that the market lacked a comprehensive publication that reliably compared leading AI models. As part of the “MortAI Kombat” project, five models were tested: ChatGPT, Claude, Gemini, Copilot, and Meta AI. The methodology was based on 47 tasks divided into 10 categories, including mathematics, code debugging, creative writing, business strategy, translation, and multimodality (image analysis). The final results were calculated using a special formula to produce an objective numerical score.

Key Takeaway: No Single Leader

The main message of the discussion is that there is no single, universal model that is best at everything. AI development is not like a sprint toward a single finish line, but rather a process in which each model is “running in a different direction,” specializing in different areas. Therefore, the choice of tool should depend on the specific task it is meant to perform.konać.

Biggest Surprises: Multimodality and Translation 

  • Failure in image analysis: None of the models solved a simple visual puzzle: identifying the color of a book on a shelf from a description of its location. The models were unable to correctly count the shelves or identify the object, even after the author manually marked it in the image.
  • Claude’s issues with naturalness: Although Claude is considered a high-quality model, it performed poorly in the translation category, producing unnatural, overly formal, “bookish” texts.
  • Copilot’s triumph in languages: Copilot, which was usually only average in other categories, proved to be unmatched in translation tasks.
Read more…: ChatGPT, Claude, or Gemini? How to Choose the Right Model

The Balance Between Speed and Quality

The tests revealed a clear relationship: response speed is often inversely proportional to quality.

  • Meta AI is the fastest model (responding in 30 seconds), but it often generates a useless “stream of consciousness,” not even divided into paragraphs.
  • Claude is the slowest (sometimes taking up to 5 minutes), but it provides the most in-depth and valuable responses.

Model Characteristics and “Personalities”

Each model displays different, distinctive traits:

  • Claude (“The Talkative Professor”): Ideal for complex analyses, business architectures, and case preparation, where style and depth of expression are highly valued.
  • ChatGPT (The Most Versatile): It proved better than Claude in technical tasks. When debugging Python code, it was more precise and did not introduce its own errors, unlike Claude.
  • Gemini (Creativity vs. Engineering): Excellent in creative tasks, but performs significantly worse when engineering knowledge or consideration of the physical properties of objects is required.

Ethical Controversies and Gemini’s “Cynicism”

The crisis communication test (a drone battery failure) produced surprising results. Gemini was the only model to demonstrate a manipulative approach, prioritizing the protection of share price and shareholder interests over full transparency toward customers. The model defended its position in a cynical way, explaining that it was acting from the perspective of the organization’s best interests.

Recommendations for Users and Businesses

Rafał Biliński emphasizes that AI models are not deterministic—the same prompt can yield different results —and that the greatest risk lies in the unpredictability of their errors. Particular caution should be exercised with numerical data and arithmetic, which always require verification.

For organizations, the optimal solution (for example, implemented in the Blue AI tool) is to deploy a platform that enables the selection of different AI engines. It is recommended to use at least two different models so that the tool can be matched to the specifics of the current problem.

Download the e-book “How to Test and Choose LLM Models in Practice”

In our e-book, you will find the results of a comprehensive comparison of 5 leading AI models, conducted across 47 business tasks in 10 categories.

The following people took part in this episode:

Podcasts

See other episodes of the “Business Tech Talks” podcast

Let’s discover what is possible
for your Business

With BlueSoft, you bring in the latest technology and benefit from experts that are eager to share their knowledge.

Connect with us