This thesis examines the copyright challenges posed by Large Language Models (LLMs) and generative AI. At the heart of this investigation lies the critical need to strike a balance between fostering innovation in the artificial intelligence sector and safeguarding the intellectual property (IP) rights of content creators. LLMs, trained on vast datasets often including copyrighted materials, spark debate regarding fair use, ethical considerations, and potential infringement. The work analyzes legal arguments surrounding the unauthorized use of copyrighted content in AI training, particularly within the context of the New York Times lawsuit against OpenAI and Microsoft. It explores perspectives of content creators, AI developers, and legal scholars, contributing to the discussion about adapting copyright frameworks to generative AI while safeguarding creative works. This thesis therefore addresses the legal responsibility of AI developers when their models use copyrighted material without authorization, given the increasing capabilities of LLMs to produce outputs that directly compete with or substitute for original copyrighted works. This is the reason why the present work investigates how copyright law should be interpreted and applied in the context of artificial intelligence, focusing on key legal doctrines such as fair use in the United States and text and data mining (TDM) exceptions in the European Union. Ethical dimensions of using copyrighted content without permission are also explored: content owners and creators rightfully assert that the unpaid use of their works undermines the financial incentives that copyright law is designed to provide. Conversely, limiting access to publicly available data may stifle innovation, especially for researchers and smaller organizations lacking the resources for extensive licensing agreements. To illustrate these complex issues, the thesis focuses on the 2023 legal case between The New York Times and OpenAI and Microsoft, which encapsulates the conflict between media organizations and AI companies. The lawsuit alleges that OpenAI and Microsoft's LLMs were trained using millions of copies of the Times' copyrighted content without permission or compensation. The thesis analyzes the arguments from both sides, the specific instances of alleged infringement, and OpenAI/Microsoft's defense strategies. By thoroughly examining this case, the thesis seeks to provide insights into adapting copyright law to generative AI and its implications for content creation and distribution. This work ultimately aims to contribute to a more nuanced understanding of the complex interplay between AI, copyright, and innovation, and proposes potential policy recommendations.
This thesis examines the copyright challenges posed by Large Language Models (LLMs) and generative AI. At the heart of this investigation lies the critical need to strike a balance between fostering innovation in the artificial intelligence sector and safeguarding the intellectual property (IP) rights of content creators. LLMs, trained on vast datasets often including copyrighted materials, spark debate regarding fair use, ethical considerations, and potential infringement. The work analyzes legal arguments surrounding the unauthorized use of copyrighted content in AI training, particularly within the context of the New York Times lawsuit against OpenAI and Microsoft. It explores perspectives of content creators, AI developers, and legal scholars, contributing to the discussion about adapting copyright frameworks to generative AI while safeguarding creative works. This thesis therefore addresses the legal responsibility of AI developers when their models use copyrighted material without authorization, given the increasing capabilities of LLMs to produce outputs that directly compete with or substitute for original copyrighted works. This is the reason why the present work investigates how copyright law should be interpreted and applied in the context of artificial intelligence, focusing on key legal doctrines such as fair use in the United States and text and data mining (TDM) exceptions in the European Union. Ethical dimensions of using copyrighted content without permission are also explored: content owners and creators rightfully assert that the unpaid use of their works undermines the financial incentives that copyright law is designed to provide. Conversely, limiting access to publicly available data may stifle innovation, especially for researchers and smaller organizations lacking the resources for extensive licensing agreements. To illustrate these complex issues, the thesis focuses on the 2023 legal case between The New York Times and OpenAI and Microsoft, which encapsulates the conflict between media organizations and AI companies. The lawsuit alleges that OpenAI and Microsoft's LLMs were trained using millions of copies of the Times' copyrighted content without permission or compensation. The thesis analyzes the arguments from both sides, the specific instances of alleged infringement, and OpenAI/Microsoft's defense strategies. By thoroughly examining this case, the thesis seeks to provide insights into adapting copyright law to generative AI and its implications for content creation and distribution. This work ultimately aims to contribute to a more nuanced understanding of the complex interplay between AI, copyright, and innovation, and proposes potential policy recommendations.
Balancing Innovation and Intellectual Property: Copyright Challenges in Large Language Models
DALL'AVA, LAURA
2023/2024
Abstract
This thesis examines the copyright challenges posed by Large Language Models (LLMs) and generative AI. At the heart of this investigation lies the critical need to strike a balance between fostering innovation in the artificial intelligence sector and safeguarding the intellectual property (IP) rights of content creators. LLMs, trained on vast datasets often including copyrighted materials, spark debate regarding fair use, ethical considerations, and potential infringement. The work analyzes legal arguments surrounding the unauthorized use of copyrighted content in AI training, particularly within the context of the New York Times lawsuit against OpenAI and Microsoft. It explores perspectives of content creators, AI developers, and legal scholars, contributing to the discussion about adapting copyright frameworks to generative AI while safeguarding creative works. This thesis therefore addresses the legal responsibility of AI developers when their models use copyrighted material without authorization, given the increasing capabilities of LLMs to produce outputs that directly compete with or substitute for original copyrighted works. This is the reason why the present work investigates how copyright law should be interpreted and applied in the context of artificial intelligence, focusing on key legal doctrines such as fair use in the United States and text and data mining (TDM) exceptions in the European Union. Ethical dimensions of using copyrighted content without permission are also explored: content owners and creators rightfully assert that the unpaid use of their works undermines the financial incentives that copyright law is designed to provide. Conversely, limiting access to publicly available data may stifle innovation, especially for researchers and smaller organizations lacking the resources for extensive licensing agreements. To illustrate these complex issues, the thesis focuses on the 2023 legal case between The New York Times and OpenAI and Microsoft, which encapsulates the conflict between media organizations and AI companies. The lawsuit alleges that OpenAI and Microsoft's LLMs were trained using millions of copies of the Times' copyrighted content without permission or compensation. The thesis analyzes the arguments from both sides, the specific instances of alleged infringement, and OpenAI/Microsoft's defense strategies. By thoroughly examining this case, the thesis seeks to provide insights into adapting copyright law to generative AI and its implications for content creation and distribution. This work ultimately aims to contribute to a more nuanced understanding of the complex interplay between AI, copyright, and innovation, and proposes potential policy recommendations.File | Dimensione | Formato | |
---|---|---|---|
Balancing Innovation and Intellectual Property.pdf
non disponibili
Descrizione: Tesi di Laurea Magistrale
Dimensione
1.22 MB
Formato
Adobe PDF
|
1.22 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/165839