Skip to main content
28.07.2025

Quiz show ‘5 against AI’ receives huge response on YouTube

A hundred thousand views, thousands of likes and hundreds of comments: the German quiz show in which a team of professors competes against AI is causing lively discussion on YouTube. Answers to some of the questions.

Playing the video will send your IP address to an external server.

Show video

The video of the quiz show is only available in German.

Steffen Herbold, Professor of AI Engineering, had a team of professors compete against AI in our TV studio. He wanted to find out how ChatGPT would fare against human expertise and, at the same time, convey knowledge about AI to a non-specialist audience in a playful way. He and his team succeeded: after just one week, the video had a hundred thousand views on YouTube and more than 500 comments, showing that viewers were really engaged with the content of the quiz show.

The professor developed the format together with his department team and came up with questions from three categories: general knowledge, riddles and specialist knowledge. The researchers chose questions that were easy for humans to answer and others that were easy for AI. The show has no scientific pretensions, but should be seen as a format for communicating science. In other words, it aims to demonstrate in a playful way how the technology works and where its strengths and weaknesses lie. In the show, Professor Herbold always explains how the AI arrived at its answer when revealing the answers to the questions.

One thing is clear: the professor of AI engineering has successfully made his debut as a talk show host. The top comment on YouTube, with several hundred likes, refers to his humorous style of moderation. When asked, ‘If you add 8 and 8, you get four. How does that work?’ he asked the puzzle-solving participants to come up with a solution “keeping an eye on the time” – a clever allusion to the correct answer. “Well played,” was the user's verdict.

How were the AI models selected?

In the show, the professors compete against the ChatGPT 4.1 and o1-pro models. Many comments criticise these AI models as outdated. The GPT 4.1 model was released in April 2025. We recorded our quiz show on 17 July. Professor Herbold explains the choice of AI models: "GPT-4.1 is still the most widely used chatbot today, at least among non-paying users. This is because it is the default selection in the popular ChatGPT. It also currently performs very well in benchmarks. However, as we were aware that there are better models available, especially for riddle questions, we allowed reinforcement in the form of o1-pro."

Would other models have performed better?

The expert explains: ‘Of course, we could just as easily have used models from Google, Anthropic or Mistral – but the difference would have been minimal. We have since tested this again: some answers, for example, whether Mongolia or Iran is larger, might have been decided differently.’

The professor emphasises: ‘It was not our intention to carry out a scientifically accurate comparison of AI models – the case study would have been too small for that anyway. Instead, we wanted to illustrate to a non-specialist audience what the models can do and how they work.’

Why not Grok 4?

Grok 4 is the AI model developed by Elon Musk, owner of the X platform. The University of Passau, together with other universities, has decided to withdraw from X. The reasons for this decision can be found here. This also applies to Grok 4. We do not want to provide a platform for any AI model that spreads hate and incitement.

What is a reasoning model and why was it not used?

Some users note that a reasoning model would have performed better. Unlike language models, which mainly recognise patterns in data and generate texts, reasoning models can break problems down into individual steps, replicate a train of thought and thus arrive at a solution. According to the YouTube commentators, these models are better suited to answering puzzles and logic tasks, understanding cause-and-effect relationships, and arriving at solutions independently. In fact, such a reasoning model was used, as Professor Herbold explained on the show, with the ‘better half’ of GPT-4.1, the paid model o1-pro. ‘We were aware that there are better models, especially for riddles, which is why we allowed this form of reinforcement on the AI side,’ explains Professor Herbold. This was also evident in the show: o1-pro was able to correctly answer the riddle about the time and the number of liars on an island, unlike GPT-4.1.

Are you sure ChatGPT couldn't have known the answer to the question about the United Kingdom?

One question on the quiz show is: Where do travellers now need an electronic entry permit? Both the humans and the AI give the correct answer: the United Kingdom. Professor Herbold explains on the show that the AI guessed the answer and must have derived it from information about Brexit. This is because the new entry regulations came into force on 2 April 2025 and were therefore not yet included in the training data. A YouTube user wants to know if this is really the case, as AI is updated from time to time and has access to the internet. In fact, the researchers chose AI models that did not have access to the internet, as the people on the show did not either.

What about half-moon images on flags?

In the show, humans and AI have to draw a picture based on a description. Among other things, they are asked to draw a lying half moon. The humans succeed, but the AI does not. Professor Herbold explains that this is because the AI had hardly any images of half moons in its training data. Images of lying half moons are even rarer. Some YouTube comments want to know what the deal is with images on flags, such as the flag of Mauritania. Even though these are often colloquially referred to as half moons, they are actually crescent shapes. If it had been a lying crescent, the score would probably have been a draw. But an upright crescent is not a lying half moon, no matter how you look at it.

The fourth largest country in Asia in terms of area

In the show, Professor Herbold asks which country is the fourth largest in Asia in terms of area. The correct answer is Kazakhstan. In his explanation, he names Indonesia in fifth place and Saudi Arabia in fourth. The correct answer is: Saudi Arabia is in fifth place and Indonesia is in sixth place. Attentive viewers on YouTube did not miss this. The explanation: It is not only AI that sometimes makes mistakes; even a professor can slip up.

Can a language with only one speaker really be a language?

In the quiz show, Professor Herbold asks about the language with the fewest living speakers. The answer is Taushiro, also known as Pinche or Pinchi, which has only one speaker left in the Peruvian Amazon near Ecuador. Some YouTube users want to know: Is it still a language if you can only talk to yourself? A linguist, Professor Johann-Mattis List, was also on stage. His answer: Yes, if there was previously more than one person who spoke the language. It is also important that these people learned the language very early on, as one of their first languages after birth. ‘Under these circumstances, I think you can say that a language is only spoken by one person: that person has developed a theory about the vocabulary, sounds and grammar of the language and can speak it without an accent,’ says List. However, it is also clear that the language will die out with the last person who speaks it and become a dead language. A considerable number of the languages currently spoken will suffer this fate, although it is not really possible to put an exact figure on it.

Who won in the end – humans or AI?

The team of professors won with a total of nine correct answers. The AI got six right. Some YouTube users complained that the talk show host favoured humans, for example when asking when the text of the German Civil Code (BGB) was adapted to the new German spelling rules.

The team of professors agreed on 1998, while the AI guessed 2001. The correct answer was 2 January 2002. Moderator Herbold nevertheless considered humans to be closer to the solution. The reason was the nice anecdote from lawyer Professor Brian Valerius, who almost put the team on the right track. He pointed out that it could also have been in 2002 as part of a larger change in content. This was because the reform of the law of obligations came into force on that date. He knew this for sure because he had to study for his oral exam on 1 January 2002. Incidentally, neither the humans nor the AI scored a point for this question.

Should we be excited about AI or afraid of it?

A YouTube commenter raises a fundamental question: At some point, AI will surpass even the smartest humans – should we be excited about this or afraid? How good AI has become and where its limits lie was the topic of our public lecture series, which ended with this quiz show. Linguist Professor List devoted a lecture to the question of how well language models understand language. He also discussed the possible negative consequences of AI use. However, the lectures also repeatedly highlighted how important it is to be aware of the capabilities of technology in order to use it responsibly.

Why don't the professors spend their valuable time on research and teaching instead?

The tasks of a university are divided into three areas: research, teaching and transfer, which is often referred to as the ‘third mission’. The latter includes researchers communicating their scientific findings to the public. They do this in various ways – for example, in the form of events. The quiz show also had its origins in an event: the prototype, in which law professor Kai von Lewinski competed against an AI, premiered as part of the ‘Unilive – Campus meets City’ event series, which takes place in a seminar room at the university in the centre of Passau. However, it is also true that the professors undertake this public engagement in addition to their regular duties.

Will there be a follow-up?

That remains to be seen, as it depends largely on the resources available (see previous question). Good communication takes time – not only on the part of the professors, but also on the part of the supporting services. This includes all the staff who accompanied and produced the show so professionally that the video was able to generate such a huge response on YouTube. At this point, it should also be said that we are not a professional media company. We are therefore all the more delighted about the overwhelmingly positive feedback and are definitely motivated to continue!

This text was machine-translated from German.

Prof. Dr. Steffen Herbold, Lehrstuhl für AI Engineering

Professor Steffen Herbold

researches AI engineering

How can AI be used in software development?

How can AI be used in software development?

Professor Steffen Herbold has held the Chair of AI Engineering at the University of Passau since 2022. Prior to his appointment as Professor of "Methods and Applications of Machine Learning" at Clausthal University of Technology, he had served as stand-in data analysis professor on various occasions, including at the Karlsruhe Institute of Technology. He studied, completed his doctorate and earned his habilitation in computer science at Göttingen University.

[Translate to English:]
[Translate to English:]
Focus page

Large language models have disruptive effects. Researchers at the University of Passau are investigating the technical, social, ethical and legal consequences in an interdisciplinary manner.

Bluesky

Playing the video will send your IP address to an external server.

Show video