AI Copyright Violations

Wednesday, September 14, 2022

This week I have been spending a good amount of time thinking about the impact of AI on creators. This was initially prompted by the discussion on the Dithering Podcast, AI Illustrators, and then by Ben Thompson in his Monday article, The AI Unbundling. I recommend reading the articles, but I will summarize and share my thoughts below.

Both articles discuss the use of Midjourney, an AI powered image generation tool, by Charlie Warzel to illustrate his article Where Does Alex Jones Go From Here?. There was a bit of a mob reaction to the use of AI generated content, and Charlie issued a later apology in I Went Viral in the Bad Way. In Charlie's words:

I was caught up in my own work and life responsibilities and trying to get my newsletter published in a timely fashion. I went to Getty and saw the same handful of photos of Alex Jones, a man who I know enjoys when his photo is plastered everywhere. I didn’t want to use the same photos again, nor did I want to use his exact likeness at all. I also, selfishly, wanted the piece to look different from the 30 pieces that had been published that day about Alex Jones and the Sandy Hook defamation trial. All of that subconsciously overrode all the complicated ethical issues around AI art that I was well apprised of.

What are the complicated ethical issues around AI art you ask? A salient point that is made in Charlie's apology is as follows:

DALL-E is trained on the creative work of countless artists, and so there’s a legitimate argument to be made that it is essentially laundering human creativity in some way for commercial product.

This is a very valid concern, to which the billions of people who have their data used to build commercial products should be very sympathetic. Legislation and licensing to protect a creator's content should be discussed as widely as protections for a user's privacy. Charlie continues:

Like others, I also have questions about the corpus used to train these art tools and the possibility that they are using a great deal of art from both big-name and lesser-known artists without any compensation or disclosure to those artists. (I reached out to Midjourney to ask some clarifying questions as to how they choose the corpus of data to train the tool, and they didn’t respond.)

While declining to respond is not an admission of guilt to copyright infringement, my understanding of how these things are usually done makes me very suspicious of companies creating AI powered content generation tools. It is all too easy to scrape Google images indiscriminately for content to train a AI models with no regard for whether an image is subject to copyright.

I see a clear parallel between the use of a user's data and a creator's content to power commercial AI products. It seems to be a significant ethical (and probably legal) problem, which is also difficult in a technical sense because it is very challenging to back out which data points were used to train a model after the fact. I plan on noodling on this in the future.