The NYTs sues Open Ai–Why the case is not a slam-dunk for the NYTs

The NYTs’ lawsuit against Open Ai, filed last week, is going viral. Some are predicting it will be a major blow to Open Ai and related companies and technology:

This is what I predicted would happen, in an article from as far back as April 2023:

Another problem with Chat GPT is gathering the necessary data, which involves crawling and organizing huge troves of content. This makes it like Google, but the difference is Chat GPT is like a parasite in that it offers nothing in return for having crawled your data, whereas Google drives traffic and ad revenue to publishers, which makes it worthwhile for both parties. This will compel publishers to block access to entities associated with Chat GPT or demand some other form of credit, remittance, or attribution. Some may be surprised to learn even simple things like song lyrics can be copyrighted. The question if AI-generated art can be copyrighted is hotly debated, but if copyrighted/trademarked works are appropriated by AI for commercial purposes, it could run afoul of trademarks (such as selling Dall-E generated art that has Disney logos and characters in it).

Just another example of this blog almost always being right or on the forefront of things.

However, I don’t think the lawsuit is as damning as assumed, nor are civil lawsuits the gold mine as sometimes portrayed by the media (like those huge tobacco company settlements) or popular culture. Lawsuits are easy to file, hard to win/settle, and even harder to collect. The NYTs faces a major uphill battle at even achieving a positively remunerative outcome, let alone collecting actual cash.

My prediction is little or nothing will come of it. More likely a sealed settlement or more possible a partnership of sorts. It is a risk, but it’s not fatal. Ai companies being sued or threats of lawsuits for copyright are not new, and went nowhere. This is from April 2023 and it went nowhere:

They trained illegally using Twitter data. Lawsuit time.

— Elon Musk (@elonmusk) April 19, 2023

Regarding the merits of the case, the NYTs’ case is not a slam-dunk. It’s complaint is that Chat’s software is scraping copyrighted, paywalled-data data for its training models.

But such access should not be possible if the paywall is set up correctly to be completely hermetic. Instead, the NYTs uses an IP-address based metered paywall, meaning overusing the same IP and or some combination of cookies triggers it, but a fresh IP and no cookie bypasses it. This is a porous paywall. The NYTs wants it both ways of having a paywall but not totally restricting the ability to read articles for free. I think this intentional porousness of the paywall weakens the NYTs’ case.

Of course one can make an argument that just because some articles are free does not permit Open Ai to unlimited access to said content. It’s like a jar of candies at a receptionists’ desk that says ‘free’ , in which it’s implicitly assumed that one does not take more than a single candy, yet GPT was taking multiple candies. Unethical, maybe, but having a deliberately porous paywall and then crying foul when one avails oneself to more free articles than expected, likely will not hold up as well. By making the candy free, it ceases being your personal property.

But the most damning argument against the NYTs is the fact it freely allows Google to automatically read and scrape its articles for the purpose of indexing and ranking. We’re not just talking a few dozen articles or IPs, but thousands of Google datacenter IPs scanning and reading thousands of NYTs articles everyday. Like Open Ai, Google uses a huge array of IP addresses to automatically scan and read NYTs’ paywalled content and use snippets of such articles in search results, and evidently the NYTs is perfectly fine with this despite not being compensated for said access.

Same for complains of copyright infringement by Hollywood and other media entities. Sure, I think Open AI may have to make some concession, but it will not by any stretch be a death blow, nor a major setback for the industry. Struggling media companies desperate for revenue and relevance are probably more then willing to work out a partnership than wage a costly, long difficult slog through the courts that they may still lose.