Fair Use Exceptions For AI

Friday, September 16, 2022

In AI Copyright Violations, I made the assertion that the creators of AI powered content generation tools are likely guilty of copyright infringement of the content that is used to train the algorithms. I would like to justify that assertion as best as I can while acknowledging that I am not a lawyer and that I am only writing about copyright law within the United States.

Copyright ownership gives the creator of an original work exclusive rights to use the work. There are several limitations on exclusive rights within the copyright law of the United States, but the only notable limitation on copyright for the purposes of this post is Section 107, which outlines the policy of fair use or a set of rules for using copyrighted content without being culpable of copyright infringement.

According to §107. Limitations on exclusive rights: Fair use:

... the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include:

  1. The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.
  2. The nature of the copyrighted work.
  3. The amount and substantiality of the portion used in relation to the copyrighted work as a whole.
  4. The effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

So is the usage of copyrighted content to train AI powered content generation tools fair use based on the above factors?

The purpose and character of the use

To summarize the opinions stated in Campbell v. Acuff-Rose Music, The more transformative content is by "altering the first with new expression, meaning, or message" the less significant other factors will be in determining whether use is fair use. The transformative nature of AI is an interesting debate by its own merit. There are many valid observations of AI memorizing data (take OpenAI's GPT as one example). Conversely the ability to generate new content sounds entirely different in terms of the value that it delivers. Regardless transformative content is not deemed not fair use just if it happens to be is profitable.

The nature of the copyrighted work

Whether a copyrighted work is factual or creative, or published or unpublished is a factor in determining whether usage of that copyrighted work qualifies as fair use. Usage of factual content that has been published is more likely to fall under fair use. In the context of training AI models this factor only appears to be generally relevant in the case of training models on unpublished work.

The amount and substantiality of the portion used in relation to the copyrighted work

While there are no guidelines for determining what amount of usage is substantial, it seems clear that usage in the context of training AI models is substantial given that the entire work is consumed during the process of training.

The effect of the use upon the potential market for or value of the copyrighted work

To summarize the opinions stated in Nunez v. Caribbean Int'l News Corp, Usage which monetizes copyrighted work or otherwise negatively affects the copyright holder's ability to monetize their work is less likely to qualify as fair use. This is likely the most significant factor in the context of training AI models. Companies such as OpenAI and Midjourney both have an expectation to monetize their models, and the usage of these models may drastically reduce a creator's ability to monetize their own content.

It is an interesting idea that increasing demand for a copyright holder's work while monetizing it at the same time can result in the use being deemed as fair use. The trouble with AI content generation tools though is that they do not provide attribution to the sources that were used to train the model. Therefore demand for the copyright holder's work is likely only ever diminished.

Conclusion

Based on the factors in Section 107, the usage of copyrighted content to train models used in AI powered content generation tools is both substantial and has high monetization capability while also reducing a copyright holder's ability to monetize directly. The transformative nature of AI generated content does not seem significant enough to qualify this use as fair use at least in an ethical sense.

Profile picture

iidBlog is written by me, Josh Howard, a software engineer currently working as a Senior Engineering Manager at Starburst Data in Atlanta, Georgia, US.

Josh Howard, Software Engineer