ChatGPT: Well trained but inaccurate

The GPT-3 algorithms and the ChatGPT chatbot are all over the place. And rightly so. OpenAI's long-in-the-making family of machine learning models has reached its current peak with the public release of the ChatGPT chatbot. However, like any new technology, it comes with certain caveats.

Cover image created by Midjourney (prompt: „person spreading easily created disinformation using artificial intelligence“).


Built on the GPT-3 language model, or more precisely GPT-3.5, it is a cutting-edge demonstration of a modern chatbot, drawing on decades of knowledge stored on the Internet.

And also a demonstration of the limits and boundaries that such a chatbot must necessarily have. Despite the apparent “intelligence” of the chatbot, it is good to remember that it is nothing more than a set of algorithms trained by supervised and reinforcement learning. Great and fascinating, but it’s not about any intelligence. There is nothing but human work behind it all.

The results show it, after all. Some iOS and Android apps, for example, use GPT-3 by dressing up the chatbot in the guise of various historical figures.

The illusion is pretty good for a while but barely overcomes the first hurdle. The various historical figures soon become indistinguishable from one another, describe details of their lives (often incorrectly, moreover), and then vaguely respond to the questions asked. It is most noticeable with the unexpected ones. Thus, when asked if he likes sushi, Genghis Khan replies that he has never encountered it but would like to try it if he had the chance.

Dataset boundaries

Algorithms have no emotion and have no knowledge other than that found on the Internet. Including knowledge that is neither verified nor true. ChatGPT is trained well, but it is far from being accurate. Let the decision of Stack Overflow site administrators, a well-known first aid site for all programmers and coders, be proof.

Some users started using ChatGPT to answer questions, but it was often wrong. It produced answers that looked simple and were delivered very confidently. Solutions that look like they might work, but the reality is different.

Stack Overflow summed up one of ChatGPT’s biggest current problems quite nicely. Namely, the chatbot tends to generate good-sounding and seemingly plausible information that is not necessarily true. It is, of course, due to the datasets it is trained on.

In this case, we know the datasets (thanks – in part – to the open-source nature of OpenAI). The Common Crawl2 dataset, which contains almost a billion words, had the highest weight (60%) in training. It represents a large part of the known Internet. The massive dataset contains unedited data from web pages, extracted metadata, and so on. Simpler algorithm families would do just fine with that, but GPT-3 is more sophisticated.

Thus, the dataset was carefully filtered based on similarities to high-quality reference data, and advanced deduplication was carried out. Several other datasets further complemented the dataset mentioned above. In training, the Web Text2 dataset, which consists of all posts from the social network Reddit between 2005 and 2020, had a weight (of 22%), as well as datasets say more trustworthy. Firstly, the datasets Books1 and Books2 (both weighted at 8%) of online book companies and articles from the Anglophone Wikipedia (weighted at 3%).

(Non)historical figures

Knowing from which datasets the various GPT-3 algorithms are trained, we can look deeper into the limits of ChatGPT and related chatbots. An interesting example is the Historical Figures app, which builds specifically on GPT-3.

In it, you can chat with various key personalities of history: Albert Einstein, Henry Ford, Ronald Reagan, but also Adolf Hitler or the mass murderer Charles Manson.

And here, we immediately run into a number of problems. For the sake of clarity, let us divide them into several points:

  • Inaccuracies. The chatbot often wholly makes things up. The personalities mix information from all corners of the internet at once, from various alternative views of history to myths and superstitions to facts. The fact that the app is available and popular in the Education section of the App store doesn’t leave a good impression.
  • Fictional justifications and lies. While this is more a matter of ethics (a discussion of which would be for a separate article), an interview with, for example, Hitler would not necessarily be objectionable. Only the historical figure in question would not be allowed to fictitiously and often nonsensically defend actions that are practically impossible to justify. Henry Ford, for example, repeatedly claims in the app that he was not an anti-Semite (though it is so clearly documented) and had a good relationship with Jews. He hadn’t..
  • Unrealistic portrayal of personalities. The application further fails to portray the interviewees’ unique character and motivations. Simply because it is challenging to replicate despite a large amount of input data. Despite many precedents, the algorithm cannot know precisely how a given personality would express oneself or what they would do. It simply isn’t that person. Algorithms can predict expected human reactions quite well, and they may achieve much higher accuracy in the future, especially with the increasing amount of information describing human motivations, goals, behaviour patterns, and more.

Do no harm, help

But let’s not be prematurely worried. ChatGPT is simply a technology, and like any technology, it will find good and bad applications. For example, it could help in the work process. However, we are not talking about replacing a person at work but rather assisting.

Imagine a journalist whose job, apart from real journalism – researching information, verifying it and so on – also involves a vast amount of routine, such as transcribing news agency releases or going through dozens and hundreds of emails from companies.

It’s the routine that ChatGPT could soon be helping with tangibly. Texts will always need to be proofread, properly referenced, and edited. Still, given their usually high stylistic quality, the algorithm could make much work easier for junior staff in particular. It does not automatically mean that the algorithm should replace them. On the contrary, it will save them the time they could spend searching for topics, producing good quality, longer original texts, or perfecting their professional expertise.

That is, after all, the essence of technology – to help, not replace.

Although ChatGPT and other GPT-3 chatbots remain, for now, more of a very advanced toy, they are showing the way where they could be dangerous. For example, as useful players in the disinformation war, that is, unfortunately, a reality of the 21st-century Internet. Quickly, confidently and stylistically generated nonsense based on some factual foundation does not inspire enthusiasm.

For the same reason, this mindless automation can spell trouble for journalism and transform it. Adopting texts from other media without in-depth fact-checking is already common practice. In the case of stylistically beautiful and, at first sight, credible articles, less responsible entities would very quickly overwhelm the information scene with fictional, likeable texts.

There are, thus, two sides of the same coin. A chatbot can simplify a tedious job, but it can also quickly turn it into a tool of hybrid warfare.

It’s also good to remember that ChatGPT can respond to simple, non-controversial questions correctly and factually. Especially ones that could easily be found on Wikipedia or for which the Internet is full of answers. It, too, can have its uses – in which case we find it a valuable educational tool. More about the use of ChatGPT in education can be found (in Czech) at AI dětem.