The European Data Protection Board (EDPB) has issued a non-binding opinion clarifying the circumstances under which personal data can be used for training Artificial Intelligence (AI) models without violating the General Data Protection Regulation (GDPR). This opinion, prompted by a request from the Irish Data Protection Authority, offers a three-step test for establishing “legitimate interest,” a legal basis for processing personal data without explicit consent. The EDPB also addressed the concept of model anonymity, defining it as a state where the likelihood of identifying individuals from the data used is “insignificant.” While providing this framework, the EDPB emphasized the role of national data protection authorities in assessing compliance on a case-by-case basis. This decentralized approach, while offering flexibility, raises concerns about potential inconsistencies in enforcement across the EU.
The EDPB’s opinion sets forth a framework for assessing “legitimate interest.” This three-pronged test requires companies to first identify the specific interest they are pursuing through the use of personal data for AI development. Secondly, they must demonstrate that processing this data is necessary to achieve their stated objective. Finally, companies need to ensure that their interest does not override the fundamental rights and freedoms of the individuals whose data is being utilized. This framework aims to balance the benefits of AI development with the protection of individual privacy rights. The EDPB stressed the importance of transparency, requiring companies to inform individuals about how their data is collected and used, a crucial aspect of maintaining user trust and ensuring accountability.
Model anonymity, a key concept in the EDPB’s opinion, requires that the risk of re-identifying individuals from the data used to train AI models be negligible. This presents a significant challenge, as AI models often require vast datasets for effective training. Achieving true anonymity necessitates careful data processing techniques, including anonymization and pseudonymization, to minimize the possibility of linking data back to specific individuals. The EDPB’s emphasis on “insignificant” likelihood of identification sets a high bar for developers and raises practical questions about how this will be measured and enforced. Moreover, achieving anonymity does not obviate the need to comply with GDPR principles, as other obligations related to data minimization and purpose limitation still apply.
The opinion has generated mixed reactions from stakeholders. Industry groups, such as the Computer & Communications Industry Association (CCIA), welcomed the clarification, emphasizing the importance of access to data for training accurate and unbiased AI models. They argue that restricting access to data would hinder innovation and limit the potential benefits of AI. However, digital rights advocates expressed concerns, particularly about the practical application of the anonymity criteria and the potential for inconsistent enforcement across member states. The lack of specific guidance on how to achieve true anonymity raises concerns about the effectiveness of this safeguard. Furthermore, the discretionary power granted to national DPAs could lead to a fragmented regulatory landscape within the EU, hindering the development of a unified approach to AI regulation.
The EDPB’s opinion also highlights the critical issue of data provenance. The opinion explicitly states that models trained using illegally obtained data cannot be deployed. This underlines the importance of ensuring that all data used for AI development is acquired through lawful means, respecting privacy rights and complying with data protection regulations. This raises complex questions about the use of publicly available data scraped from the internet, a common practice in AI development. The EDPB’s forthcoming guidelines on web scraping will be crucial in determining the legality of such practices and their implications for data protection.
Looking ahead, the EDPB acknowledges the need for ongoing guidance on emerging issues in AI development. The planned guidelines on web scraping, a common practice for gathering training data for AI models, are eagerly anticipated. These guidelines will need to address the complexities of scraping publicly available data, considering issues such as the rights of individuals to control the use of their online data, even if publicly posted. The balance between fostering innovation and protecting fundamental rights will be a central challenge for future regulatory developments. The EDPB’s work in this area will be crucial in shaping the future of AI development in the EU, ensuring that it aligns with the principles of privacy and data protection.