July 21, 2022

Choi on Computational Corpus Linguistics

Jonathan H. Choi, University of Minnesota Law School, has published Computational Corpus Linguistics. Here is the abstract.
Scholars and judges increasingly interpret legal text by studying word use in real-world documents, a method known as “corpus linguistics.” But the traditional approach to corpus linguistics encounters several problems. It focuses on word frequencies at the expense of subtler linguistic cues and presents no clear dividing line between correct and incorrect textual meanings. It also requires a variety of subjective and opaque judgment calls, allowing motivated interpreters to cherry-pick the method that supports their favored meanings. This Article proposes a new, computational approach to corpus linguistics. It uses machine learning and natural language processing to algorithmically evaluate word meaning. By measuring the semantic similarity between words, we can answer questions of legal interpretation—for example, by testing whether “judge” is similar to “representative,” and therefore whether judicial elections are governed by the Voting Rights Act. Computational approaches produce quantitative estimates of similarity that reflect the intuitive semantic relationships between words. This Article extracts qualitative implications from these quantitative estimates by benchmarking against a known scale of word similarity, based on H.L.A. Hart’s famous “vehicles in the park” hypothetical. Applying computational corpus linguistics, this Article finds that semantic questions in real-world legal cases rarely give clear answers. Borrowing Hart’s analogy, most cases are closer to asking whether a bicycle is a vehicle than whether a car is a vehicle. Moreover, estimates of similarity vary substantially between corpora, even large and reputable ones. This suggests that the choice of corpus matters more than previously recognized and that traditional corpus linguists must consult multiple corpora to decrease the risk of cherry-picking. These empirical findings have important implications for ongoing doctrinal debates outside of corpus linguistics, suggesting that text is less clear and objective than many textualists believe. The Article develops these implications with discussion on the nature of linguistic meaning in legal interpretation. Ultimately, the Article offers new insights both to theorists considering the role of legal text and to empiricists seeking to understand how text is used in the real world.
Download the article from SSRN at the link.

No comments: