OpenAI: Copyright Case Over AI Training Thrown Out

In many areas, we are watching history in the making. technology is changing things are a rapid pace. This means the future is going to be largely determined by what is taking place today.

Few dispute the impact that generative AI is already making. The question remains as to what the eventual impact will be. Areas such as jobs, safety, and the economy are going to be affected. At this time, we have no idea exactly how that will unfold.

OpenAi and its CEO, Sam Altman, are at the center of this. It made a lot of headlines through the lawsuits which were filed against the company. Most of these came from news organizations who assert copyright infringement.

Naturally, the rulings in these cases is going to set the precedent for the next couple decades. Many speculate that Big Tech is violating existing laws.

We now have the first indication where things could be heading.

Source

OpenAI Copyright Case Thrown Out

Before diving into this discussion, I will state that I am not a lawyer, so my opinion here is not based upon legal training. At the same time, laws within the United States vary among the states and, of course, there is a difference between countries.

Nevertheless, a case in New York federal court was tossed by the judge who determined the plaintiffs failed to show harm.

Judge Colleen McMahon dismissed the case after finding that the plaintiffs failed to show concrete harm from OpenAI's use of their content as training data. Unlike other lawsuits targeting AI companies, this case focused on the removal of copyright management information rather than direct copyright violations—though Judge McMahon noted the underlying issue remained the same.

Courts dealing with future cases will have to decide if the same applies to outright copyright violations.

The decision to cite the failure to show material harm is going to set a precedent. Here we are seeing the claim that OpenAI is creating a competing product.

The judge's decision supported the fair use defense of OpenAI and other AI companies, noting that ChatGPT creates synthesized responses from its training rather than copying content directly. She emphasized that the likelihood of ChatGPT reproducing exact copies of articles is minimal, and pointed out that factual information in articles isn't copyrighted anyway.

This is where things could get hairy for those filing based upon copyright violations.

Inspiration

The basic mechanism of these models is not input/output. These cases are based upon the idea that these companies are reproducing their content. It simply is not how the technology operates.

Actually, it that were true, we would not have hallucination problems with these models. When prompted, it would spit out a identical replica of what was input.

Instead, the output from these models is a novel creation. The information generated is the same as this article, something completely new.

What this means is the models being trained on the information is akin to being inspired.

For example, the works of Stephen King are covered under copyright laws. I cannot take one of his books, photocopy it, place a new cover on it, and sell it as my own. That is obviously illegal.

I can, however, read all of Stephen King's books in detail. If desired, I could study his writing style. Character formation, the establishment of plots, and building of suspense could be aspects that I focus upon.

Perhaps I am successful to the point where I can write Stephen King horror almost as well as he can. In that instance, I am the closest thing to him there is.

There is one problem: I am not Stephen King. Even though I trained myself on his material, nothing I put out will be from Stephen King. He is my inspiration and that is it. Even if my writing style resembles him, to the point where many have difficulty distinguishing, nothing I do will be his.

In my unprofessional legal view, this would seem to be the roadblock many of these cases are going to encounter.

Even if the AI is trained on articles from the New York Times, it does not spit out an identical copy. There might be a similar writing style to some of the authors but that is it. We cannot say that output from ChatGPT is a New York Times production.

Of course, there is also the issue where it was trained on a lot of other material. How do you separate the New York Times material from the Washington Post if the model was trained on both?

Information Yearns To Be Free

This is a concept that goes back to the early days of the Internet.

Many of the early cypherpunks (and those who follow that mindset) believe information should be free. The Internet is the world's largest copy machine. governments have done their best to apply (rewrite) copyright laws to the digital realm.

The results are mixed at best.

It is evident the business model for information changed over the last 40 years. We can see the main way of monetizing information was through advertising. It became about clicks.

While many dispute this model, it is a way to keep information free.

The Internet certainly brought down the cost of information. While there is a lot of nonsense online, we can most of what we need for zero charge. Social media, albeit cesspools in many instances, has a lot of information that use to only be available from those who specialized in the delivery of that content (i.e. newspapers and television stations).

Generative AI is taking this to an entirely new level.

Like always, the legal system is slow to catch up. With the pace things are moving, governments have no chance of keeping pace. These lawsuits will take years to fully resolve.

By then, the world will be a completely different place.

For now, OpenAi won the first round. There are still dozens of rounds in this fight so it is far from over.

What Is Hive