Where did the chatbot hear that?

The buzz over more than the last year in cyberspace has been arguably buzzier than we’ve seen in a while. It is the buzz about AI chatbots, the highest profile one at the moment being ChatGPT and its peripheral functions, created by OpenAI.

The buzz has been triggered by ChatGPT’s abilities in several areas. One is ChatGPT’s ability to come up with plausible answers to questions, and in English bordering on human-created text.

Another is its amazing ability to come up with things in diverse styles such as haiku and rap on demand.

Yet another is ChatGPT’s ability to make breathtakingly stupid factual mistakes, some being total fabrications, which have come to be called hallucinations, but that could still fool unwary and credulous chatbot-struck users. A related problem is its own credulity in believing leading questions and producing responses that rely on falsehoods and mischaracterizations in questions put to it.

These aspects of ChatGPT’s behavior aside, the appearance of such chatbots means that humans must pay more attention to credibility and accountability than ever before.

If a human friend tells you something that is not only shocking but incredible in the true sense of the word, you can ask the friend “Where in the world did you hear that?” And if your friend says she heard it from YouTube, you might be just a bit skeptical. If she learned it from a certain highly opinionated podcaster known for promoting conspiracy theories, you might start to wonder about the trustworthiness of that friend’s statement, including statements about other subjects. But you should be thankful that your human friend is at least willing and able to reveal the source of her information, enabling you to evaluate it. That’s where AI chatbots part ways with the real world.

ChatGPT and its like collect information from countless Internet sources, some good, some not-so-good, and some totally wrong. The learning process is an opaque and impenetrable black box. You might wonder what sources were used to generate a totally fabricated and factually incorrect account of events that you know is wrong; or about what sources were used to generate a true, useful response. You might not care if you know the answer to the question you asked and are only window-shopping for chatbot failure stories to post online.

But what about when you ask ChatGPT or its now-multiplying wannabe clones a non-trivial question you don’t know the answer to? If the chatbot gives you a plausible-sounding answer, you or others might believe it and could make decisions based on the chatbot response.

I have experimented numerous times with some leading questions I know the answers to; ChatGPT failed miserably in too many cases to repair the damage already done to its reputation with me. Getting facts wrong about events that are not likely to affect our lives or fortunes is one thing. Fabricating answers to questions that are more important, however, is potentially very dangerous.

Since AI chatbots learn from what humans have written on the Internet, the quality of what the humans write is even more important than before. When you consider that much of what is written on the Internet is not even written by fully identified humans, the potential problems come into focus. It is important to be able to know and evaluate the sources of an AI chatbot’s learning. But before that, it would be better if the chatbot itself could know and evaluate the sources of the information from which it is learning, thereby front-loading quality into its knowledge base and, by extension, its responses. The anonymity and lack of accountability that has long been a characteristic of Internet information makes that quite difficult.

That anonymity and lack of accountability is a problem even when chatbots are learning from human-sourced information. But when chatbots start flooding the Internet with their own content, sometimes helped along by humans who trusted them, will chatbots effectively start learning from other chatbots that themselves have learned from not-very-learned humans or even from other chatbots? The image of multiplying mops in Disney’s Sorcerer’s Apprentice comes to mind. Let the believer beware.

Species of Translation Origin

Many countries, particularly ones with their own manufacturing capability or that are wary of products produced elsewhere, require products sold domestically to be marked with the country of origin.

Translation sellers have never had to fulfill that requirement and, in recent decades, the large translation brokers selling Japanese-to-English translation became power users of yet other translation brokers in China, where almost no translators have either Japanese or English as their native language. What could go wrong? Well, lots of things, but that is a topic for a different article.

Enter AI, and the problem of origin is escalated to one of whether a translation originated from a human or something else. Just as products of questionable origin have their origin laundered by having the product processed in some way in a respectable and trusted country, artificial translations can and do have their origins laundered by having members of our species process them to make them at least look usable. Translation purchasers and users should beware of such species of origin laundering.

There are good reasons why we do not use AI to translate.

Of Mice and Mousetraps

If you don’t have the budget or don’t want to pay for real cheese, you might try putting a photograph of a piece of cheese in your mousetrap, but don’t expect to catch anything but a photograph of a mouse, and an out-of-focus one at that. People who choose to use artificial intelligence to translate should not be surprised to find that they receive an artificial translation, and a poor one at that.

There are good reasons why we choose to continue to provide only professional translation.

AI takes Japanese-to-English translation back to the days when front-loading of quality was not that important, but this time with some new twists.

The buzz since last year in the translation business was all about AI and how it will revolutionize the way translations are done. Well, in some ways yes, but in one particularly important way, AI is taking people back to the past, when front-loading of quality in JA-EN translation seemed to be purposefully avoided, for reasons that varied depending upon the era we are discussing.

Throughout the evolution of Japanese-to-English translation, the globally shared inherited wisdom that a translator should be translating into their native language was largely ignored. The appearance of AI has made things worse in that respect. and it has actually presented the new twist of using a “translator” that has no native language and no understanding of the real world.

Japanese-to-English translation has a long history of not front-loading quality. The reasons varied, based on the operative belief system, the business requirements of Japanese selling and needing translations, and the availability and costs of translators at various times.

Stage One:  Native Japanese-speaking translators treated as mission critical

In the old days (for me, the late 1970s), significant numbers of people in Japan had never met a native English-speaking (NES) translator. Many believed that the reading and understanding of a Japanese source text needed to be done by a native Japanese-speaking (NJS) translator. How could a non-Japanese possibly understand the “uniquely” difficult language of Japan? There was a distinct resistance to using non-Japanese translators.

I would venture to guess (no guessing required, because it’s true) that the overwhelming portion of JA-EN translation was done by NSJ translators and then “brushed up” (as the expression was in those days) by someone else. That “someone else” was often a hapless native speaker of English enlisted to fix the translation, sometimes without the ability to read and understand the Japanese source text and without the advantage of familiarity with the subject matter. I personally know people who did such work. Having seen the output from NJS translators in those days, some of which made it into publications such as product catalogs, I know that the people involved in the production were not front-loading quality into the translation process.

Stage Two:  The rise of native English-speaking translators

As NES translators of Japanese became more common in the 1970s and 1980s, some people in the translation business and even a small number of translation consumers dared to entrust their documents to NES translators. Their translations required much less editing and usually no rewriting, but they were much more expensive than NJS translators. Even if the resistance to using NES translators could be overcome, however, there were not enough of them to handle the large volumes of JA-EN translation required. NJS translators thus were still dominant in translating Japanese into what was for them a foreign language. This was often (but not always) followed by editing at the hands of foreigners here in Japan. I know numerous people who were doing such editing work but who could not read the Japanese source text. It didn’t matter; they were still just “brushing up” the translation to make it presentable.

Stage Three:  Chinese translation brokers enter the Japanese-to-English translation business

Around the end of the first decade of the 21st century, numerous large translation brokers in the US began using translators and other translation brokers in China to do JA-EN translation, done by translators who have native ability in neither the source language (Japanese) nor the target language (English). They were what I will call third-language translators (TLTs). What possibly could go wrong?

Well, an examination of documents translated JA-EN by people in China reveals that, although such translations are dirt cheap, they are very often of poor quality, often including serious mistranslations. This is not surprising, since many of the translators could probably never have experienced Japan or the Japanese language first hand, but only from China.

Again, this approach does not place value on front-loading of quality in the translation process, but rather takes the approach of quick-and-dirty translations that are then (perhaps) subjected to repair work to make a document usable.

Stage Four:  Enter AI

In the second decade of the 21st century, AI that could produce translations of a sort—the sort being artificial translations—appeared. It promised to totally up-end the translation process, but actually tends to offer a number of problems even the previous faulty approaches didn’t have.

Professional translators, we are told, are too expensive and not needed, and the solution is AI. We are told that AI machine translation can be sufficiently improved in quality by a new breed of workers called post-editors.

Well, this might work for some types of translation, providing expectations can be sufficiently lowered, but the presence of artificial intelligence that produces artificial translations means that the process returns to one in which front-loading of quality is ignored, with a number of additional problems. Specifically:

  • Whereas professional translators have real-world experience and understand the subject matter, AI lacks both essential qualities. It never “gets out” into the real world and it understands nothing, beyond its attempt to emulate the writing behavior of a professional translator by statistical learning from cyberspace content, and it often fails to achieve a sufficient level of quality.
  • Whereas a professional translator knows their strengths and weaknesses and can decline a job if the translator’s capabilities would be exceeded, AI doesn’t care about quality and doesn’t need to decline a job because of lack of knowledge or translation ability. It just goes right ahead and produces a translation, the assumption being that it can be fixed later. Essentially, this is a programmed Dunning-Kruger effect.

Throughout the four stages of JA-EN translation, there have been varying levels of front-end loading of quality into the translation process. The best shot the JA-EN translation business had at front-loading quality was back in the days when native English-speaking translators became more common. Those days are coming to an end for a large portion of the translation that is sold, because it is shifting rapidly to AI.

Promoters of AI take us back to not front-loading quality, using a process that has native ability and understanding of neither the source nor the target language. Another significant problem is that deceptively good English—produced at breathtaking speed and ostensibly extremely low costs—can blind people to the problems involved behind the curtain of the human-like English.

The players have changed, but this time there are problems that are not faced when professional translators were used. Perhaps it is time to modulate the AI translation hype just a bit.

Japanese Government to Use AI to Accelerate Translation of Laws

Nikkei online reported on March 26 that the Japanese government, in an effort to accelerate the translation of Japanese laws into English, was going to employ AI. The goal is to achieve a four-fold increase in the pace of law translations.

Prime Minister Kishida is promoting foreign investment in Japan, one aim of which is to increase domestic production of strategic items such as semiconductors and batteries.

Nikkei reports that previous efforts to translate statutes relied on private companies and could take as long as one year. The new system is one developed based on software from the National Institute of Information and Communications Technology (NICT) and enhanced by machine learning of terminology unique to laws.

The aim is to translate 160 laws in FY 2023 and 320 laws in each of FY 2024 and 2025. The laws translated will focus on business statutes relating to the Civil Code and banking laws.

The article concludes with mention that, with the increase in number of laws translated, the government is considering increasing the number of specialists to verify the translated content. It does not provide any details regarding those specialists and their qualifications.