How Tech Giants Employ Newspeak to Take IP Owners Down the Rabbit Hole with “Generative AI”
In the most serious assault on intellectual property rights since the “America Invents Act,” the so-called “Generative AI” industry is manipulating language to alter social perceptions of right and wrong, and they have copyrights in their crosshairs.
The Neutering of the US Patent System through the Cunning Use of Language.
The US Constitution provided: “The Congress shall have Power… To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” 223 years later, President Obama signed the “America Invents Act” on September 16, 2011. “The AIA and the Patent Trial and Appeal Board (PTAB) it created have destroyed countess lives and the startup businesses of small, independent inventors through the Inter Partes Review (IPR) and Post Grant Review (PGR) proceedings...” according to US Inventor, the largest inventor-led non-profit organization. Federal Circuit Chief Judge Randall Rader publicly called the PTAB “death squads killing property rights.”
The excuse provided for this body blow to the US innovation economy was the ginned up notion that it was somehow wrong to own a patent but not manufacture products. Unsurprisingly, it was money from the giant tech manufacturers that flooded Washington DC to turn the tide against the small inventor-owners — money that was accompanied by a manipulation of language in which Newspeak like “non-practicing entity” was carefully coined as disparaging by the tech giants. Manipulation of language resulted in manipulation of thought and suddenly non-manufacurting patent owners were popularly perceived as “patent trolls.”
According to Judge Paul Michel (also CAFC Chief Judge, and Judge Rader’s predecessor), “The ‘patent troll narrative’ remains ubiquitous, decades later, being seen everywhere in public advertisements – even on buses and in airports. Being simple, catchy, and dramatic, it still has excessive, if lesser, influence. Anti-patent companies in certain industries, particularly big tech firms, each spending many millions per year on heavy lobbying and massive PR by dozens of lobby, law and PR firms, dominated the national and Capitol Hill policy “debate,” if one can even call it a ‘debate.’ Massive campaign contributions helped their well organized, relentless, well-funded, and ultimately successful campaign to weaken patents and infringement remedies.“
Now They’re Coming for Your Copyrights — Again, with the Cunning Use of Language.
“Generative AI.”
As the battle between the IP-haves and the IP-have-nots continues to evolve, copyrights are now in the crosshairs. Again, its the tech giants like Alphabet, Microsoft, IBM, Amazon, and Meta that are taking sides against IP rights owners. The Newspeak Trojan horse this time is the term “Generative AI.” But unlike “patent troll,” “Generative AI” is not invective. Instead, it insinuates that there is something intelligent at work … something with a creative capability of its own that is perhaps merely “inspired” by what it “learns.”
Enthusiastic reporting and packaging for investors generally conflate the term “AI” (a term from the 50s describing a computer system that could understand the world around it and solve problems) with the term “machine learning” (a discipline premised on the use of neural networks to extract a predictive function from analyzing data sets). Today’s so-called “Generative AI” systems are really machine learning systems (a subset of which are called “LLMs”) entirely dependent on millions of individual humans to create a vast body of original works of authorship to mimic and average. Machine learning tries to reproduce the patterns it has “learned” from training data as precisely as possible. Machine learning systems have no understanding of the outputs they produce; instead “a picture of a goose is just a statistical correlation among wings, feathers, and beaks,” and textual output is just a next-word prediction with no understanding of the context or meaning of the words. (See “There is No Such Thing as ‘Generative AI’,” Harry Law, University of Cambridge researcher, January, 31, 2023.). Even though Microsoft tells the court, “LLMs are a breakthrough in artificial intelligence,“ the advanced machine learning systems of today are neither intelligent nor artificial.
“Training Data.”
The “Generative AI” industry refers to the vast body of original works of authorship created by millions of humans (on which the industry relies) as “training data.” The term “training data” implies an educational purpose is being served. Conveniently, copyright law generally acknowledges research and education under a “fair use” exception unburdened by any requirement to pay royalties.
But is “Generative AI” “training” really the classroom use case for which the safe harbor is intended? Or is it mass exploitation without paying the customary price? Consider the following economic estimate of the burgeoning market segment: “With the influx of consumer generative AI programs like Google’s Bard and OpenAI’s ChatGPT, the generative AI market is poised to explode, growing to $1.3 trillion over the next 10 years from a market size of just $40 billion in 2022, according to a new report by Bloomberg Intelligence (BI). Growth could expand at a CAGR of 42%, driven by training infrastructure in the near-term and gradually shifting to inference devices for large language models (LLMs), digital ads, specialized software and services in the medium to long term, BI’s research finds. Moreover, rising demand for generative AI products could add about $280 billion of new software revenue, driven by specialized assistants, new infrastructure products, and copilots that accelerate coding. Companies like Amazon WebServices, Microsoft, Google and Nvidia could be the biggest beneficiaries, as enterprises shift more workloads to the public cloud.”
Note that both the New York Times and Universal Music Group have shown that the “Generative AI” defendant’s systems (ChatGPT in the case of NYT, and Claude in the case of UMG) have reproduced articles and lyrics (“training data“) verbatim.
The “Outdated” Legal System.
ELIZA was an early natural language processing computer program created in 1964 at MIT that generated text, and predictive language models have been around for 60 years without creating an existential challenge to copyright law — that is until OpenAI, Microsoft, et al, boldly trained their models using the entire internet without permission. In a twist on the old expression, “its better to beg for forgiveness than to ask for permission,” the “Generative AI” industry is doing neither, and instead complains the legal system is outdated.
Since the outdated legal system holds that there are no copyrights in the output of a machine then there is nothing to see here says one “Generative AI” CEO: “Currently, AI-generated content resides in a unique position within copyright laws. These laws were crafted in a different era, focusing on human authorship as the cornerstone of copyright eligibility. However, the remarkable capabilities of AI to generate original, high-quality photos and music without direct human authorship put us at the edge of a new frontier. We operate under the current legal framework, which does not extend copyright protection to works created without human ingenuity, allowing us to offer this content in the public domain.”
OpenAI suggests that its not just the legal system, but society itself that needs to change: “We hope to start a dialogue on … how society can adapt to these new capabilities.“
Just Like a “VCR”.
The Motion Picture Association of America over-reacted to the advent of VCRs in front of Congress when Jack Valenti famously said, “The VCR is to the American film producer and the American public as the Boston strangler is to the woman home alone.” In response to being sued by the New York Times, Microsoft seized on Valenti’s Chicken Little moment in an attempt to make concern over mass unauthorized copying look like alarmist hysteria: “By harnessing humanity’s collective wisdom and thinking, LLMs help us learn from each other, solve problems, organize our lives, and launch bold new ideas. … Despite The Times’s contentions, copyright law is no more an obstacle to the LLM than it was to the VCR (or the player piano, copy machine, personal computer, internet, or search engine).“
VCRs copied a single copyrighted work at a time for individual at-home non-commercial use. LLMs are crawling the internet ingesting the collective digitized works of humanity in order to create an industry estimated to be worth $1.3 trillion in the next 10 years. Despite the fact than an LLM is not remotely like a VCR, this language of substitution is Microsoft’s first move — literally page 1 of its first substantive filing — in its epic battle with the New York Times. Why lead with that thought?
“Safety.”
Whether or not “Generative AI” presents a legitimate “safety” concern — for example, in the way social media clearly does — the “Generative AI” industry continually injects the straw man “safety” concern into the debate regarding its alleged mass unauthorized copying. For example, in its response to the court in the New York Times lawsuit, Microsoft insists it is “leading the way in promoting safe and responsible AI development.“ However, the concern in the case is not “safety,” but rather the legality of mass copying without permission. In another example, the same week that Tennessee passed the first-of-its-kind law to protect the music industry from voice piracy, OpenAI proclaims about its voice “AI,” “we have implemented a set of safety measures.“ So when intellectual property owners express concern over “Generative AI” as a tool for piracy, the “Generative AI” industry responds by saying, “don’t worry, it’s safe.” Hmmm.
In both discrete vocabulary and the broader narrative, the IP have-nots are again keen to redefine the world around them and use their economic might to redefine the boundaries of property rights. Probably we won’t get good, consistent answers that protect the rights of authors as fast or clearly as required.
In the meantime, we must be mindful of: (1) indoctrinating nomenclature, (2) the precise mechanics of how our valuable property is accessed and utilized, and (3) the economic opportunities such access creates for the authors vs the non-authors.
All rights — including the right to use this content for machine learning that will generate output for commercial purposes — are reserved by the author.