Using AI and responsibility for data privacy

Deutschland

AI is often also used to process personal data. Responsibility for data privacy is determined by the respective use case.

Everyone is talking about artificial intelligence (AI), and it is revolutionising various areas of life at a breathtaking speed. Unlike traditional computer programmes, which are based on clearly defined processes, AI attempts to replicate human-like intelligence with skills, such as logical thinking or learning.

Naturally, these opportunities are accompanied by numerous questions relating to data protection and IT security. Already in the training phase, AI manufacturers/AI providers are confronted with various tricky data protection issues, such as the relevant legal basis for processing personal data as training data or relating to the proper handling of information obligations and data subjects’ rights. AI users also have to deal with a number of data privacy issues when using AI at a later stage. One of the fundamental questions here is who is the controller within the meaning of Art. 4 (7) GDPR when the AI systems are being used if the AI systems process personal data.

What is certain is that according to the principles of the GDPR, there must be at least one controller if personal data is being processed by AI. What is also certain is that the AI – no matter how advanced and intelligent it may seem – cannot itself be a controller within the meaning of the GDPR, as under the current laws, only a "natural or legal person, public authority, agency or other body" can be a controller under the GDPR. In this respect, there has been speculation for several years now (beyond data protection law) about the possibility of creating a so-called "e-person" in the future. However, the current prevailing view is that Siri, Alexa & co. do not (yet) have their own legal personality within the meaning of the GDPR.

Even if the core standards of the GDPR for assessing the question of data protection responsibility when using AI are thus established, there are still detailed issues that must be assessed on a case-by-case basis based on the design and capabilities of the specific AI system, its purpose and the operators involved.

Responsibility within the meaning of the GDPR

The controller plays a central role under the GDPR and is subject to numerous obligations. Not only must the controller ensure, for example, that the data processing is carried out lawfully (Art. 5 ff. GDPR) and that data subjects' rights are fulfilled (Art. 12 ff. GDPR), it is also accountable for compliance with data protection requirements (see Art. 5 (2) GDPR). The controller also plays a role with regard to the dreaded GDPR fines or possible claims for damages of third parties.

The GDPR defines the controller under Art. 4 (7) GDPR as "the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data". Determining the responsibility, therefore, depends on whether the actual and legal possibilities can influence the "purposes and means" of data processing, i.e. in particular, on whether an operator can decide on the "why" and "how" the data is processed.

A differentiation must be made between the controller and any other operators who may also be involved in data processing. Other operators are, for example, the data subject whose personal data is being processed under data protection law, or the processor who processes personal data on behalf of the controller as an extended arm bound by instructions.

Responsibility for the productive use of AI systems

If AI systems are used productively (after completion of the initial training phase), the controller is, in particular, the provider of the AI system and the respective user who wishes to use the AI system for their own purposes.

In this sense, a user can be either a natural person or a legal entity. For example, companies that use AI for their own internal company processes or for products or services provided on the market should be categorised as "users", whose employees and customers may in turn be data subjects under data protection law. It is only if the user is a natural person, e.g. if the AI system is used privately, that this usually coincides with the person of the data subject. The assessment of responsibility must be based on the individual processing actions/steps and the specific circumstances surrounding the use of the AI system.

Responsibility for the selection and provision of data input

Most AI systems, especially the text and image generators that everyone has been talking about since 2022, offer a chat interface for entering information or an upload tool for providing other data. Depending on the respective AI, this may include whole documents, graphics, spreadsheets, etc., which may contain personal data. The so-called prompt, i.e. the specific task to be solved by the AI, is also set at this stage.

The pre-selection of the data to be provided to the AI and the subsequent uploading of this data to the AI already constitutes data processing within the meaning of the GDPR. Since it is generally only the user who decides which information is passed on to the AI for processing, the user is also primarily responsible for this data. In particular, the respective user retains control over which data is entered into the AI for the pre-selection of the data to be provided (data input). The user decides which data is to be processed and determines the specific objectives and purposes of the processing.

Responsibility for processing the data input by the AI and generating the data output

The data input is then used by the AI to resolve the specific prompt and provide a response in the form of a data output. In particular, the data input is analysed using so-called Large Language Models (LLMs) and decoded into concrete work instructions for the AI. Using LLMs, AI is able to understand and process natural language and reproduce the processing results in the form of natural language. If the data input/prompt contains personal data, this may already constitute relevant data processing.

The data output can then be processed and generated by comparing and evaluating the input data with other data from a database and/or with publicly accessible data from the internet (depending on the type of specific AI application). This direct data synchronisation constitutes the processing of personal data if the synchronised database contains personal data that is analysed in the process.

However, it is currently disputed to what extent personal data is also being processed if the data to be synchronised is stored in the form of vectors, i.e. as mere data links. However, even in these cases, a personal reference cannot be ruled out per se if conclusions can still be drawn from the overall picture of the data connections to at least identifiable natural persons. In this case, the information fragments stored in the model might only be pseudonymised data (insofar as they must first be reassembled into usable data), but not completely anonymous data.

From data input to data output: responsibility of the user

The user also qualifies as the controller under data protection law for this processing step, as they have provided the input data precisely for the purpose of processing and generating the data output by the AI and have determined the purpose and means for this (at least by selecting the data input and the specific AI used).

From data input to data output: responsibility of the AI provider

If the AI is a self-hosted solution without a connection to application programming interfaces (API) or other data flow to the developer/provider or other third parties, the user is likely to remain solely responsible under data protection law. The fact that the AI provider initially programmed and provided the AI system and determined the technical functionality and the algorithms used by the AI can hardly be sufficient for the AI provider to be held (co-)responsible. It is correct that with the programming the AI provider already specifies the data processing (the means) initiated later by the user, which the user adopts in the context of the subsequent concrete data processing. However, this is the case with all software and therefore cannot be deemed decisive for the role of the controller.

If, on the other hand, the AI is a Software-as-a-Service (SaaS) or AI-as-a-Service and the AI provider is still involved in the data processing initiated by the user, the AI provider is at least one potential additional operator in the circle of possible controllers. However, this does not automatically make the AI provider the controller of the data processing carried out by the AI within the meaning of the GDPR. If the AI provider does not have any interest in the processed data with the processing of the data input beyond the provision of the AI system for a fee (i.e. does not pursue its own purposes in terms of data protection law) and sees itself as bound by instructions to the user, the status of processor is more likely. This is the rule for SaaS and also appears to correspond to the current views of the relevant AI providers, who usually provide order processing contracts for these purposes on their websites.

However, this is different if the AI provider reserves comprehensive usage rights to the input and output data and intends to use the data to further optimise the AI model, for example. The AI provider is responsible under data protection law at least for its own use.

However, it is disputed whether this automatically makes the AI provider responsible for all data processing. Opinions vary, but some favour this as a controller can never be a processor at the same time. However, other views think a distinction must be made according to the individual processing purposes, meaning the AI provider would be the controller for the use of the data for its own purposes, but would continue to be the processor for the general processing of the input data for the (third-party) purpose of presenting the user with a processing result. The AI provider therefore only "leaves" its status as a processor for the processing of data for its own purposes.

Joint controllership for the user and the AI provider

Even if both the user and the AI provider are both controllers within the meaning of the GDPR, this does not necessarily mean that they are joint controllers within the meaning of Art. 26 GDPR. According to the wording of Art. 26 GDPR, this would require both parties to jointly determine the purposes and means of the processing – both controllers would have to pull together, so to speak, and want or aim for the same thing.

However, it can be assumed that this is only rarely the case. Rather, the primary interests of the parties are likely to differ significantly. While the user's only interest by using the AI is normally to have data input processed into data output that they then want to use for their own purposes, the AI provider’s purpose is monetary (execution of the contract with the user) and therefore will usually charge a licence fee for generating the data output (e.g. for API use), however the AI provider also wants to further optimise its model and train it for future customers. 

This interest in the continuous further development of the AI model is usually only attributable to the AI provider, and not the user. Without further evidence, there is no need for the user to benefit from the continuous improvement of the model. However, this can be different if the AI provider and the user cooperate extensively with one another or if the AI provider develops or customises the AI specifically for the user.

This may be different if the user and AI provider jointly develop AI for specific purposes for the user from the outset, and the AI provider and user pursue a common purpose that goes beyond the provision of the services offered by the AI provider in return for payment.

Responsibility under data protection law for further use of the data output

As a processing result, the AI outputs certain information (the data output), which may sometimes contain personal data from third parties, either because the corresponding data was already available in the input or because this data was added to the data output during processing by the AI. As a rule, only the respective user decides on the (further) use of this data output. Consequently, only the latter is responsible as controller under data protection law for the further use of this data.

Special features of "strong" AI systems

So-called "strong" AI systems have a high degree of autonomy. Self-learning systems, for example, (further) develop their capabilities independently which means that the decision-making and processing steps are no longer transparent or predictable.

This poses the question whether a controller can still be determined for "strong" AI applications. As the autonomy of the AI increases, the influence of the AI provider and user on the processing procedures also decreases.

Against the background of the increasing autonomy of "strong" AI applications, it is therefore sometimes argued that detailed knowledge of the processing operations is crucial and, conversely, that a lack of knowledge also excludes the status of the controller, since central obligations of responsibility, such as the preparation of a processing directory in accordance with Art. 30 GDPR, or the provision of information to data subjects in accordance with Art. 13 ff. GDPR would require precise knowledge of the data processing operations that are taking place.

However, this is countered by the opposite view, according to which the lack of explainability and determinability of "strong" AI does not automatically lead to "liberation" from the responsibility of the central operators, but "only" to difficulties and challenges in the implementation of central duties of those responsible. However, this in turn is an inherent problem of complex technologies, which – at least to some extent – has been fundamentally taken into account in the GDPR. For example, the GDPR provides that numerous obligations of the controller and rights of the data subject be subject to a discretionary assessment – starting with the fact that it is sufficient under Art. 14 (1) lit. e) GDPR that only the categories of data recipients are named instead of specific data recipients or that information pursuant to Art. 14 (5) lit. b) GDPR does not have to be provided if this appears impossible or disproportionate in individual cases. The remaining "rigid" controller obligations of the GDPR, on the other hand, must be countered in such a way that they are already taken into account in the development of "strong" AI and that the rights and obligations under the GDPR (as well as other fundamental values) are already "laid in the cradle" for the AI (also in the sense of privacy by design).

Data protection law is technology-neutral and also applies to AI 

AI has come (years ago) to stay. The new models that are emerging around AI almost every week highlight the incredible possibilities. As with all new technologies, however, data protection law must be taken into account as a fundamental compliance requirement. In individual cases, this can give rise to complex data protection issues. The assessment of the fundamental question of the responsibility of the operators involved should be based on the individual processing actions/steps and the specific circumstances of the use of the AI system. Depending on the application and functionality of the specific AI system, it may also be necessary to enter into data processing agreements, standard contractual clauses or (in exceptional cases) joint controller agreements in order to fulfil GDPR requirements.