Can copyright works be used to train AI systems?

United Kingdom

The UK government has been asked recently to provide clarity on the relationship between intellectual property law and AI. In particular, there have been many discussions (including consultations and parliamentary committee reports) over the last three years in relation to the use of works protected by copyright and/or database right for training AI systems. However, as yet, the position under UK intellectual property law has not changed.

In this article, we provide an overview of the discussions to date and the current approach proposed by the government, which seeks to balance the interests of rightsholders against the desire to support AI-driven innovation.

What is the current position under UK law?

Most machine learning models need to be trained on large volumes of data. Training data may include works protected by copyright or database rights.

In the UK, copyright is infringed if the substantial part of a protected work is used without the copyright owner’s permission, unless an exception applies. Similarly, database right is infringed where a person extracts or re-utilises a substantial amount of data from a protected database without the rightsholder’s permission, unless an exception applies.

UK copyright legislation already contains an exception for copying for text and data analysis, often referred to as text and data mining (“TDM”). The existing exception allows a person with lawful access to a copyright work to copy it to carry out a computational analysis of anything recorded in it, but the exception is not available unless the analysis is “for the sole purpose of research for a non-commercial purpose”. In addition, the exception only applies to copyright and does not extend to database right.

Where TDM is to be carried out in the UK for commercial purposes (or for a mixture of commercial and non-commercial purposes), rights holders therefore argue that a licence is likely to be required for the use of any works protected by copyright or database rights.

What’s happened in the UK?

While rightsholders currently expect to be able to charge a fee for licensing use of their works for TDM, concerns were raised that this did not strike the right balance between rightsholders and AI developers.

IPO proposes new exception

In September 2020, the UK Intellectual Property Office (“IPO”) published a call for views, posing questions on how the UK’s intellectual property (“IP”) regime could encourage the use and development of AI, including in relation to the use of copyright works and data by AI systems. In response to the call for views, the government stated that it would consult on measures to make it easier for copyright works to be used with AI (including for TDM), including improved licensing or copyright exceptions, to support innovation and research.

The IPO then published a consultation in October 2021, calling for views on licensing or exceptions to copyright for text and data mining. The consultation considered the following main options:

  • keep the existing TDM exception, and do not make any legal changes;
  • make it easier to obtain a licence to use copyright works for TDM, including through model licences or codes of practice;
  • extend the existing TDM exception to allow commercial scientific research and TDM of databases;
  • add a new exception to allow TDM of copyright works and databases for any use by anyone (whether commercial or non-commercial), but let rightsholders opt out; or
  • add a new exception to allow TDM of copyright works and databases for any use by anyone (whether commercial or non-commercial), without letting rightsholders opt out.

In response to the consultation, the government decided to introduce a new copyright and database exception allowing TDM for any purpose (i.e. along the lines of the final option). While rightsholders would no longer be able to charge for UK licences for TDM, the government considered that rightsholders would still be able to protect their content, including by requiring lawful access.

Reaction to IPO’s proposal

However, representatives from across the creative sector continued to voice significant concerns about the proposed exception, including concerns about potential loss of revenue. In July 2022, the Publishers Content Forum (a group of cross sector companies, trade bodies and collective management organisations focused on publishing) wrote to the Secretary of State for Business, Energy and Industrial Strategy, with concerns that the proposed exception would “seriously undermine” the UK’s IP framework, would have a “severe negative impact” on UK rightsholders and would “create an unfairness” benefitting those using content for TDM. The Publishers Content Forum warned that if businesses were no longer able to license and receive payment for the use of their data and content, certain businesses would “have no choice but to exit the UK market or apply paywalls where access to content is currently free”.

In January, the House of Lords Communications and Digital Committee published a report (titled At risk: our creative future) on the likely impacts of technology on the creative sector over the next 5–10 years. The report acknowledged the tension between developing new technology and supporting rightsholders in the creative sector but concluded that the IPO’s proposal failed to take sufficient account of the potential harm to the creative sector. Although the development of AI is important, the Committee commented that it should not be pursued “at all costs”. The Committee recommended that the IPO:

  • pause its proposed introduction of the new TDM exception;
  • conduct and publish an impact assessment on the implications for the creative sector; and
  • if the impact assessment identified negative effects on businesses in the creative sector, pursue alternative approaches.

UK government abandons proposed exception

The Minister for Science, Research and Innovation subsequently stated in February that the introduction of the exception would not be proceeding, and this was confirmed in the government’s response to the report in April.

Proposal for code of practice

Instead, Sir Patrick Vallance published a report in March (titled Pro-innovation Regulation of Technologies Review: Digital Technologies) on how regulation can support emerging digital technologies. The report highlighted the need for regulatory certainty and recommended that the government announce a clear policy position on the relationship between IP law and generative AI.

In response to the report, the government confirmed that the IPO would:

  • produce a code of practice by summer 2023 to provide guidance to support AI firms to access copyright works as an input to their models, and ensure there are protections (such as labelling) on generated output to support copyright owners; and
  • provide guidance on enforcement to AI firms by summer 2023, coordinate intelligence on any systematic copyright infringement by AI, and encourage the development of AI tools to help with the enforcement of IP rights.

According to the government’s response, both the AI and creative sectors would be involved in informing the code of practice, helping to ensure a balanced and pragmatic approach. The intention is that, if an AI firm commits to the code of practice, it can have a reasonable licence offered by a rightsholder in return. However, the government acknowledged that, if the code of practice is not agreed or adopted, legislation may be needed instead.

What progress has been made on the code of practice?

The IPO formed a working group consisting of industry representatives from the technology, creative and research sectors, with working group meetings starting this June. According to the terms of reference, the working group is tasked with:

  • identifying any concerns of rightsholders in relation to the use of protected material (i.e. copyright works, performances and databases) by AI systems and users, and suggesting how to address those concerns;
  • identifying any barriers to access to protected material by AI systems and users, and suggesting how to address those barriers; and
  • setting out commitments and expectations in relation to AI firms’ use of protected material and the rightsholders who own protected material.

The IPO has not yet published a code of practice on the use of copyright works by AI firms. The IPO’s webpage on the code of practice does not contain any updates on the progress of the working group’s discussions and does not indicate if and when the code of practice will be available.

In its evidence for the House of Lords Communications and Digital Committee inquiry on large language models, the Alliance for Intellectual Property stated that, during the working group’s first meeting, “it became evident that while many AI developers were seeking to license content from rightsholders, certain large language models maintained that they did not need permission to ingest content from rightsholders and therefore would not seek licences”. It therefore appears unlikely that AI firms will be able to reach agreement with the creative sector on a code of practice.

What else has happened over the summer?

The House of Commons Culture, Media and Sport Committee published a report in August (titled Connected tech: AI and creative technology) in which the Committee examined the implications of the proposed TDM exception and stated that the existing position under English law provides an “appropriate balance between innovation and creator rights”. The report contained a number of recommendations, including that the government should:

  • not introduce a broad TDM exception to copyright, commenting that the government’s initial handling of the TDM exception showed a “clear lack of understanding” of the needs of the UK’s creative sector and that the government needed to work to “regain the trust” of the creative sector as a result;
  • take action to ensure that creators are “well rewarded” in the copyright regime;
  • consider how creators can ensure transparency as well as appropriate recourse and redress for wrongful use of their works in AI development;
  • support small AI developers who may find it difficult to acquire licences, by considering the introduction of licensing schemes for technical material and mutually-beneficial arrangements between rights management organisations and creative sector trade bodies; and
  • provide an update, by the end of this year, on its direction in managing the impact of AI on the creative sector.

The House of Commons Science, Innovation and Technology Select Committee (“SIT Committee”) published an interim report in August (titled The Governance of Artificial Intelligence), identifying a series of challenges for policymakers. This included the so-called “Intellectual Property and Copyright Challenge”: where AI models and tools make use of other people’s content, policy should establish the rights of the originators of this content and these rights must be enforced. The SIT Committee considers that these challenges should form the basis for discussions at the global summit on AI safety, due to take place at the start of November.

Echoing the SIT Committee, the CEO of the Publishers Association wrote to the Prime Minister in August on behalf of the UK publishing industry to ask that the government makes clear either as part of, or in parallel with, the AI safety summit that UK IP law “should be respected when any content is ingested by AI systems” and that “the training of AI systems should be done transparently, with the consent of, and in a manner that credits and fairly compensates the creator or IP rightsholder i.e. under licence”.

Although the programme for the summit refers to licensing, it does not mention IP or copyright expressly, and so it remains to be seen whether discussions at the AI safety summit will cover the interplay between AI and IP.

What’s next?

The debate between supporting AI innovation and protecting the interests of rightsholders is still ongoing. For now, organisations will need to wait to see:

  • if the IPO publishes a code of practice on copyright and AI;
  • the extent to which the interplay between AI and IP in discussed at the AI safety summit, due to take place on 1 and 2 November; and
  • if relevant legislation is mentioned in the King’s Speech, scheduled to take place on 7 November, or is otherwise introduced in the upcoming parliamentary session.

The authors would like to thank Bronagh Miller, Trainee Solicitor at CMS, for her assistance in writing this article.