The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Journal of Free Speech Law: "Where's the Liability in Harmful AI Speech?," by …
Profs. Peter Henderson, Tatsunori Hashimoto, and Mark Lemley, just published in our symposium on Artificial Intelligence and Speech; more articles from the symposium coming in the next few days.
The article is here; here is the Abstract:
Generative AI, in particular text-based "foundation models" (large models trained on a huge variety of information including the internet), can generate speech that could be problematic under a wide range of liability regimes. Machine learning practitioners regularly "red-team" models to identify and mitigate such problematic speech: from "hallucinations" falsely accusing people of serious misconduct to recipes for constructing an atomic bomb. A key question is whether these red-teamed behaviors actually present any liability risk for model creators and deployers under U.S. law, incentivizing investments in safety mechanisms.
We examine three liability regimes, tying them to common examples of red-teamed model behaviors: defamation, speech integral to criminal conduct, and wrongful death. We find that any Section 230 immunity analysis or downstream liability analysis is intimately wrapped up in the technical details of algorithm design. And there are many roadblocks to truly finding models (and their associated parties) liable for generated speech.
We argue that AI should not be categorically immune from liability in these scenarios and that as courts grapple with the already fine-grained complexities of platform algorithms, the technical details of generative AI loom above with thornier questions. Courts and policymakers should think carefully about what technical design incentives they create as they evaluate these issues.
And here's the Introduction:
ChatGPT "hallucinates." That is, it often generates text that makes factual claims that are untrue and perhaps never even appear in its training data. It can get math problems wrong. It can get dates wrong. But it can also make things up. It makes up sources that don't exist, as one lawyer found out to their chagrin when they cited nonexistent cases in a legal brief. It makes up quotes.
And it can make up false claims that hurt people. Ask it what crimes a particular person has committed or been accused of, and ChatGPT might get it right, truthfully saying, for instance, that Richard Nixon was accused of destroying evidence to hide a burglary committed by his campaign, or truthfully saying that it is unaware of any accusations against a person. But it will also sometimes tell a false story about a crime. ChatGPT 3.5 (but not 4.0), for instance, says that one of us (Lemley) has been accused and indeed found liable for misappropriating trade secrets. (He hasn't.) Others have falsely been accused by ChatGPT of sexual harassment.
This isn't a problem of bad inputs. Rather, it is a function of the way large language models (LLMs) or foundation models work. ChatGPT and other similar models are trained to imitate large language datasets, but they don't generally copy text from any particular work directly. Instead, they generate text predictively, using the prompts and the prior words in the answer to predict what the next logical words in the response should be.
That enables them to generate new content rather than copying someone else's, and allows some amount of generalizable problem solving and writing ability. But it also means that the model is not simply taking content from existing writing (true or not), but potentially making up new things each time you ask it a question. When asked questions that involve well-known entities that appear often in the training data, the model can generate accurate text with high confidence, such as in the case of Nixon's crimes. But when queried about entities that appear much less frequently, these models can rely upon a "best guess" rather than a known fact. ChatGPT might associate Lemley with trade secrets (and therefore, wrongly, with misappropriating them) because he has written academic articles on the subject, for instance.
Worse, the false statements read just like the true ones. Because language models are good at modeling human writing, they pepper their false reports of crimes with the same things a real report would include—including (made up) quotations from reputable sources (whose articles are also made up).
This is a problem. It's not great to have false accusations of crimes and other misconduct out there. But it's even worse because models like ChatGPT are good at mimicking human language and seeming authentic. People may be inclined to believe these statements, for several reasons: (1) human experience with similarly authoritative-seeming stories from the real world suggests that they are generally true, (2) ChatGPT is quite good at accurately reporting facts in many settings, and (3) people don't understand how ChatGPT works or that it suffers from hallucinations.
Even worse, such believable false statements are not the only form of speech by generative models that could cause liability. Models have already encouraged people to commit self-harm, leave their spouses, and more. They can generate threats to get users to comply with their demands. They can aid malicious actors by generating content for propaganda or social engineering attacks. They may give plausible-seeming answers to questions about coding that lead programmers astray. They can even be used in a semi-autonomous loop to generate malware that bypasses standard detection techniques.
These harmful behaviors may arise even when the model never trains on any one problematic text. In effect, it can hallucinate new harmful behavior, not grounded in anything it has seen before.
Researchers regularly spend countless hours probing models through a process called "red teaming" to identify potential harmful speech that the model may generate in response to users and then work to identify a fix for this behavior. The red-teaming scenarios used by researchers range from defamatory hallucinations to hate speech to instructions on how to create a nuclear weapon. These are hard technical problems to solve, and a huge amount of research has focused on finding technical solutions to prevent harmful AI speech.
These are also hard legal problems. They raise thorny questions at the heart of both liability and immunity from it under Section 230 of the Communications Decency Act (hereafter "Section 230"). We discuss the nature of the problem in Part I, drawing on "red teaming" scenarios often used by researchers and real reports of suspect AI speech. As we show in Part II, there aren't any easy or perfect technical fixes to this problem, but there are ways to reduce the risks. In Part III, we show that it is not obvious that existing liability doctrines are currently capable of easily dealing with harmful speech from AI, nor are all designs for generative AI created equal in the immunity or liability analyses. We examine some recently proposed design fixes for hallucinations or bad behavior and examine how they change both the immunity and liability analysis for AI-generated speech.
Finally, in Part IV we offer some suggestions and warnings about how different legal outcomes might affect technical incentives. We suggest that there should not be broad-based immunity from liability, either formally or through the many roadblocks that current analyses face. But we also caution against broad-based liability. Instead, we argue the law should pay attention to the technical details of how foundation models work and encourage targeted investments into technical mechanisms that make models more trustworthy and safe.
To get the Volokh Conspiracy Daily e-mail, please sign up here.
Show Comments (2)