Training AI models on Synthetic Data: No silver bullet for IP infringement risk in the context of training AI systems

February 5, 2024

Part 1

The recent rapid advancements of Artificial Intelligence (“AI”) have revolutionized creation and learning patterns.

Generative AI (“GenAI”) systems have unveiled unprecedented capabilities, pushing the boundaries of what we thought possible. Yet, beneath the surface of the transformative potential of AI lies a complex legal web of intellectual property (“IP”) risks, particularly concerning the use of “real-world” training data, which may lead to alleged infringement of third-party IP rights if AI training data is not appropriately sourced.

Please click here to continue reading part 1 on the Cleary IP and Technology Watch blog.

Part 2

Using synthetic data to train AI models has the potential to help overcome several legal hurdles faced by AI developers. This is mainly because, as the law stands today, synthetic data would likely not itself be eligible for copyright protection in the EU (although the law is still evolving on this point and the position may eventually vary under national law). Under EU law, a work must be the “author’s own intellectual creation” for copyright to subsist in it. AI-generated works such as synthetic data may be found not to meet this standard since they arguably do not have a human author.

Please click here to continue reading part 2 on the Cleary IP and Technology Watch blog.

Part 3

One of the hurdles faced by AI developers that using synthetic data may help overcome arises under the EU Copyright Directive (the “Copyright Directive”). More specifically, the use of synthetic data to train an AI model may allow developers to side-step certain uncertainties and complexities raised by the application of the exemptions to specific copyrights and related rights under Articles 3 and 4 of the Copyright Directive for the reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining (the so-called “text and data mining” or “TDM” exceptions).

Please click here to continue reading part 3 on the Cleary IP and Technology Watch blog.

Part 4

In addition to IP infringement risks, there are of course plenty of other interesting legal questions which may arise in relation to the use of synthetic data to train AI models. These topics are not covered in this series in further detail, but it is worth noting at least that the use of synthetic data can also mitigate certain specific risks under applicable data protection and privacy laws. On the other hand, using synthetic data to train an AI model could give rise to increased product liability risk, including under the proposed AI Product Liability Directive.

Please click here to continue reading part 4 on the Cleary IP and Technology Watch blog.