Scholarship@WashULaw

Grading Machines: Can AI Exam-Grading Replace Law Professors?

Jens Frankenreiter, Washington University in St. Louis School of LawFollow
Kevin L. Cope, University of Virginia School of LawFollow
Scott Hirst, Boston University School of LawFollow
Eric A. Posner, University of Chicago Law SchoolFollow
Daniel Schwarcz, University of Minnesota Law SchoolFollow
Dane Thorley, Brigham Young University - J. Reuben Clark Law SchoolFollow

Document Type

Working Paper

Language

English (en)

Publication Date

2025

Publication Title

University of Chicago Law School, Coase-Sandor Institute for Law & Economics Research Paper Series

Abstract

In the past few years, large language models (LLMs) have achieved significant technical advances, such that legal-advocacy organizations are increasingly adopting them as complements to—or substitutes for—lawyers and other human experts. Several studies have examined LLMs' performance in taking law school exams, finding mixed results. Yet there have been no published studies systematically analyzing LLMs' competence at one of law professors' chief responsibilities: grading law school exams. This paper presents results of an analysis of how LLMs perform in evaluating student responses to legal analysis questions of the kind typically administered in law school exams. The underlying data come from exams in four subjects administered at top-30 U.S. law schools. Unlike some projects in computer or data science, our goal is not to design a new LLM that minimizes error or maximizes agreement with human graders. Rather, we seek to determine whether existing models—which can be straightforwardly applied by most professors and students—are already suitable for the task of law exam evaluation. We find that, when provided with a detailed rubric, the LLM grades correlate with the human grader at Pearson correlation coefficients of up to 0.93. Our findings suggest that, even if they do not fully replace humans in the near future, LLMs could soon be put to valuable tasks by law school professors, such as reviewing and validating professor grading, providing substantive feedback on ungraded midterms, and providing students feedback on self-administered practice exams.

Keywords

Artificial Intelligence, Grading, Large Language Models, GPT, Legal Pedagogy

Publication Citation

Kevin L. Cope et al., Grading Machines: Can AI Exam-Grading Replace Law Professors? (University of Chicago Law School, Coase-Sandor Institute for Law & Economics Research Paper No.25-35, 2025) https://ssrn.com/abstract=5851362 [http://dx.doi.org/10.2139/ssrn.5851362]

Repository Citation

Frankenreiter, Jens; Cope, Kevin L.; Hirst, Scott; Posner, Eric A.; Schwarcz, Daniel; and Thorley, Dane, "Grading Machines: Can AI Exam-Grading Replace Law Professors?" (2025). Scholarship@WashULaw. 935.
https://openscholarship.wustl.edu/law_scholarship/935

Download

Included in

Legal Education Commons, Legal Studies Commons

COinS

Scholarship@WashULaw

Grading Machines: Can AI Exam-Grading Replace Law Professors?

Document Type

Language

Publication Date

Publication Title

Abstract

Keywords

Publication Citation

Repository Citation

Included in

Search

Links

Browse

Author Corner

Scholarship@WashULaw

Grading Machines: Can AI Exam-Grading Replace Law Professors?

Authors

Document Type

Language

Publication Date

Publication Title

Abstract

Keywords

Publication Citation

Repository Citation

Included in

Share

Search

Links

Browse

Author Corner