Generative Artificial intelligence (GenAI) systems like MidJourney and ChatGPT that can generate creative works have brought a wave of new questions and complexities to copyright law. On the heels of a recent court decision denying registrability of AI created work, the U.S. Copyright Office recently issued a formal notice of inquiry seeking public comments to help analyze AI’s copyright implications and form policy recommendations for both the Office and for Congress. The notice is quite extensive and raises fundamental questions that many have been discussing for several years about copyrightability of AI outputs, use of copyrighted material to train AI systems, infringement liability, labeling AI content, and more. The Copyright Office’s inquiry is an attempt to respond to AI’s rapidly growing impact on creative industries. [Link to the Notice]
The following is a rough overview of three core inquiries that I identified in the notice. It is also easy to just read it yourself by clicking on the notice above.
A core inquiry is whether original works that would ordinarily be copyrightable should be denied unless a human author is identified. Generative AI models produce outputs like text, art, music, and video that appear highly creative and would certainly meet copyright’s originality standard if created by natural people. Further, if human contribution is required, the questions shift to the level of human contribution necessary and procedural requirements to claim and prove human authorship. As the notice states, “Although we believe the law is clear that copyright protection in the United States is limited to works of human authorship, questions remain about where and how to draw the line between human creation and AI-generated content.” Factors could be the relative or absolute level of human input, creative control by the human, or even a word count. With copyright it is helpful to have some bright lines to streamline the process of registration without substantial case-by-case lawyer input for each copyrighted work, but any hard rule might skip over the nuanced. Although the notice focuses on copyrightability, ownership questions will also come into play.
A second important core inquiry focuses on training data that is fundamental to today’s generative AI models. The copyright office seeks input on the legality of training generative models on copyrighted works obtained via the open internet, but without an express license. In particular, the Office seeks information about “the collection and curation of AI datasets, how those datasets are used to train AI models, the sources of materials ingested into training, and whether permission by and/or compensation for copyright owners is or should be required when their works are included.” Presumably different training models could have different copyright implications. In particular, an approach that does not store or actually copy the underlying works would be less likely to be be infringing.
In building the training model, we often have copying of works without license, and so the key inquiry under current law appears to be the extent that fair use applies to protect the AI system generators. In other areas, Congress and the Copyright Office have stepped in with compulsory licensing models, that could possibly work here — a system of providing a few pennies for each web page. Our system also supports approaches to voluntary collective licensing via joint management organizations; perhaps supported by a minimum royalty rate. An issue here is that many of the folks creating training data are doing so secretly and would like to maintain their data and how the model is using the data as trade secret information. That lack of transparency will raise technical challenges and costs for the underlying copyright holders.
A third core area focuses on infringement liability associated with AI-outputs that result in a copy or improper derivative work. Who is liable — the AI system developers, model creators, and/or end users? A traditional approach would allow for joint liability. Again though, the lack of transparency makes things potentially difficult to prove copying, but perhaps availability and likelihood are enough. On this point, notice also asks about the idea of labeling or watermarking AI content as suggested a recent White House / Industry agreement. Although I see this issue as outside of copyright law, the inquiry suggests some penalty for failure to label.
Comment