Plagiarism and Content that Never Will Be

Until recently, I was working on a comprehensive directory of the etymologies of hundreds of (682, to be precise) names given to warships of the Imperial Japanese Navy. Silly me, I had even toyed with the idea of making that content accessible on a corner of my website.

I have suspended all efforts to create that content, for a good reason.

Several times I had an article on my old website discussing the conventions used in naming of IJN ships.

I recently discovered (for the nth time, actually, where n is an angering integer) that significant portions of that article have been included verbatim and put into a Wikipedia page discussing IJN ship naming conventions, written by an anonymous author, of course, and without any credit given to the original content or permission from me to use the content as a significant part of the article.

To set the record straight, I had restored that article to a corner of my website that houses such non-business content as demonstration of what was stolen, but have now even removed that, as it would likely just bring more theft, including clueless claims that it was “public domain” because it was publically accessible.

We live in a world in which criminals get away with crime because they have been given anonymity by the tech bros claiming to create a wonderful world by connecting people. I say lock the criminal tech bros up and take back intellectual property rights they have made super-easy to infringe.

The Internet is awash with stolen and criminal content. Billionaires claiming to have a mission to connect people partner with outright criminals, including, for example, Chinese criminals on Facebook using what appear to be AI-generated photos of whores, who might turn out to be thugs in Kosovo. Such garbage has become the norm.

But not to worry. If you install an ad blocker, you can pretend you don’t know that it’s happening. But it’s still happening.

Returning to the plagiarism, it could be that many people don’t realize that stealing and republishing content without permission and with no credit given is a crime. Why? Because you can get away with it, so it must not be a crime, right? Anonymity is the friend of criminals.

Weeks ago I asked on an IJN-related Facebook group if anyone knows who created or manages the offending Wikipedia page, but I don’t have much hope of getting a useful answer, and that lack of hope has turned out to be justified. Information wants to be stolen, right? I suspect it is a member of that group who took and republished the content.

Not surprisingly, in the above-noted Facebook group, there is virutally no original content, most members concentrating on scanning and uploading photos from published books. Such is the nature of much of social media.

Some Thoughts on Content Creation and Theft

I’ve never been fond of the term “content creator”, basically because it’s thrown around by large numbers of people who have nothing to say, other than that they want to be thought of as content creators. That self-applied term is as meaningless as things like start-up (which has become a meaningless buzzword in Japanese as well), entrepreneur, solopreneur, and a diverse spectrum of other popular buzzwords. Anyone can call themselves a content creator, and that has led to a serious devaluing of the term.

But for people who actually create content or have likenesses they wish to protect the rights to, the Internet—and social media in particular—has simply enabled theft thereof without consequences, including theft of material purportedly protected by laws.

Anything you create and dare to put online can be unlawfully published and used to make profit, and there’s virtually nothing you can do about it that will have any effect, unless you are a large corporation with a team of attorneys, and even those entities are plagued by pirating and unlawful publishing.

The provenance of most of the content uploaded to social media is unknown and undisclosed, not that disclosing the provenance grants publishing rights; it does not. Since a lot of that content it is the result of a multiple unlawful publishing, an unlawful republisher very likely doesn’t even know who owns the content they have unflawfully republished. The proliferation of “Where is that?” questions about photos and the annoyance of some thieves with those questions is evidence of this situation. The unlawful republisher often does not know from where an impressive photo was taken.

Anonymity and the social media business models that rely on providing and protecting user and advertiser anonymity have rendered legal remedies meaningless, even if they were economically feasible, which they seldom are.

This is demonstrated by the countless anonymous page posts on Facebook. Zuckerberg is certainly not interested in stopping these posts, because they provoke engagement, and engagement gives him and his company more money and increased power to capture the attention—and manipulate the behavior—of what are now billions of users.

The game has been won by the tech giants, and it looks like nobody is willing to stop them. People who remain silent are guilty of contributory negligence and act as accomplices, although apparently many haven’t a clue as to what’s going on.

Jaron Lanier was right.

Thoughts on stock photos and AI-generated photos

You often see company websites with photos of what are intended to look like groups of employees, sometimes sitting in a meeting room or standing around chatting. These are almost all stock photos, purchased for the purpose of decorating a company website with attractive photos of attractive people who have no connection with the company using the photo.

A typical stock photo of a group includes:

  • handsome males,
  • beautiful females, and
  • a woke makeup of genders, ethnicities, and ages.

Some people might look at the photo and believe that these are actually people who work at the company or are customers for the company’s products or services. Many will not. Is that an honest way to present the company? Perhaps some people would say no.

Now take an example of a company using a typical AI-generated photo depicting the same type of group, which includes:

  • handsome males,
  • beautiful females, and
  • a woke makeup of genders, ethnicities, and ages.

There are still people who would say this is dishonest, but there is an aspect of the photo that would disclose clearly to visitors to the website that what they are viewing is fake. One out of five of the people depicted will have the wrong number of fingers on one of their hands or have their left or right hand attached to the end of the wrong arm.

There you have it, honesty restored by embracing one of the strengths of AI, anatomical hallucination.

(On the occasions we might use AI for photos (we never use it for translation), we flag that fact by using a mouseover text that indicates the source.)

Where did the chatbot hear that?

The buzz over more than the last year in cyberspace has been arguably buzzier than we’ve seen in a while. It is the buzz about AI chatbots, the highest profile one at the moment being ChatGPT and its peripheral functions, created by OpenAI.

The buzz has been triggered by ChatGPT’s abilities in several areas. One is ChatGPT’s ability to come up with plausible answers to questions, and in English bordering on human-created text.

Another is its amazing ability to come up with things in diverse styles such as haiku and rap on demand.

Yet another is ChatGPT’s ability to make breathtakingly stupid factual mistakes, some being total fabrications, which have come to be called hallucinations, but that could still fool unwary and credulous chatbot-struck users. A related problem is its own credulity in believing leading questions and producing responses that rely on falsehoods and mischaracterizations in questions put to it.

These aspects of ChatGPT’s behavior aside, the appearance of such chatbots means that humans must pay more attention to credibility and accountability than ever before.

If a human friend tells you something that is not only shocking but incredible in the true sense of the word, you can ask the friend “Where in the world did you hear that?” And if your friend says she heard it from YouTube, you might be just a bit skeptical. If she learned it from a certain highly opinionated podcaster known for promoting conspiracy theories, you might start to wonder about the trustworthiness of that friend’s statement, including statements about other subjects. But you should be thankful that your human friend is at least willing and able to reveal the source of her information, enabling you to evaluate it. That’s where AI chatbots part ways with the real world.

ChatGPT and its like collect information from countless Internet sources, some good, some not-so-good, and some totally wrong. The learning process is an opaque and impenetrable black box. You might wonder what sources were used to generate a totally fabricated and factually incorrect account of events that you know is wrong; or about what sources were used to generate a true, useful response. You might not care if you know the answer to the question you asked and are only window-shopping for chatbot failure stories to post online.

But what about when you ask ChatGPT or its now-multiplying wannabe clones a non-trivial question you don’t know the answer to? If the chatbot gives you a plausible-sounding answer, you or others might believe it and could make decisions based on the chatbot response.

I have experimented numerous times with some leading questions I know the answers to; ChatGPT failed miserably in too many cases to repair the damage already done to its reputation with me. Getting facts wrong about events that are not likely to affect our lives or fortunes is one thing. Fabricating answers to questions that are more important, however, is potentially very dangerous.

Since AI chatbots learn from what humans have written on the Internet, the quality of what the humans write is even more important than before. When you consider that much of what is written on the Internet is not even written by fully identified humans, the potential problems come into focus. It is important to be able to know and evaluate the sources of an AI chatbot’s learning. But before that, it would be better if the chatbot itself could know and evaluate the sources of the information from which it is learning, thereby front-loading quality into its knowledge base and, by extension, its responses. The anonymity and lack of accountability that has long been a characteristic of Internet information makes that quite difficult.

That anonymity and lack of accountability is a problem even when chatbots are learning from human-sourced information. But when chatbots start flooding the Internet with their own content, sometimes helped along by humans who trusted them, will chatbots effectively start learning from other chatbots that themselves have learned from not-very-learned humans or even from other chatbots? The image of multiplying mops in Disney’s Sorcerer’s Apprentice comes to mind. Let the believer beware.