Who in the World is that Chatbot Learning From?The unaccountable leading the unaccountable

by William Lise

(March 24, 2023)

There is a buzz buzzing about in cyberspace arguably buzzier than we've seen in a while. It is the buzz about AI chatbots, the highest profile one at the moment being ChatGPT, created by OpenAI.

The buzz is focused in two areas. One is the amazing ability of ChatGPT to come up with things in diverse styles such as haiku and rap on demand; the other is its ability to make breathtakingly stupid factual mistakes, some being total fabrications, which have come to be called hallucinations, but that could still fool a credulous chatbot-struck user.

These two aspects of the ChatGPT abilities aside, the appearance of such chatbots means that humans must pay more attention to credibility and accountability than ever before.

If a human friend tells you something that is not only shocking but incredible in the true sense of the word, you can ask the friend "Where in the world did you hear that?" And if your friend says she heard it from YouTube, you might be just a bit skeptical. If she learned it from a certain highly opinionated podcaster known for promoting conspiracy theories, you might start to wonder about the trustworthiness of that friend's statement, including statements about other subjects. But you should be thankful that your human friend is at least willing and able to reveal the source of her information, enabling you to evaluate it. That's where AI chatbots part ways with the real world.

ChatGPT and its like collect information from myriad Internet sources, some good and some not-so-good. The process is opaque, an impenetrable black box. You might wonder which sources were used to generate a particular totally fabricated and factually incorrect account of events that you know it is wrong; or about which sources were used to generate a true, useful response. You might not care if you know the answer to the question you asked and are only window-shopping for chatbot failure stories to post online. But what about when you ask ChatGPT or its now-multiplying chatbot wannabe clones a non-trivial question you don't know the answer to. If the chatbot gives you a plausible-sounding answer, you or others might believe it.

I have experimented numerous times with some leading questions I know the answers to; ChatGPT failed miserably in too many cases to repair the damage it has already done to its reputation with me. Getting facts wrong about events that are not likely to affect our lives or fortunes is one thing. Fabricating answers to questions that are more important, however, is potentially very dangerous. Since AI chatbots learn from what humans have written on the Internet, the quality of what the humans write is even more important than before.

When you consider that much of what is written on the Internet is not even written by fully identified humans, the potential problems come into focus. It is important to be able to know and evaluate the sources of an AI chatbot's "knowledge." But before that, it would be better if the chatbot could know and evaluate the sources of the information from which it is learning, thereby front-loading quality into its knowledge base and, by extension, its responses. The anonymity and lack of accountability that has long been a characteristic of Internet information makes that quite difficult. That is a problem when chatbots are learning from ostensibly human-sourced information. But when chatbots start flooding the Internet with their own creations, sometimes helped along by humans who trusted them, will chatbots effectively start learning from other chatbots that themselves have learned from not-very-learned humans or even from other chatbots? The image of multiplying mops in Disney's Sorcerer's Apprentice comes to mind. Let the believer beware.