Artificial Intelligence stands at the forefront of autonomous driving technology. Self-driving vehicles rely on sophisticated algorithms and an array of sensors to navigate and make real-time driving decisions without human intervention. Advanced driver-assistance systems enhance safety and convenience with features such as automated emergency braking, lane-keeping assistance, and traffic sign recognition. These systems not only improve driving safety but also pave the way for fully autonomous vehicles.
However, the integration of AI in the automotive industry involves extensive collection and analysis of personal data, raising significant data privacy concerns. In April 2024, the French data protection authority (CNIL) issued practical guidelines regarding personal data processing activities during the development stages of AI systems. These guidelines aim to support AI developers in addressing relevant questions to ensure compliance with GDPR when processing personal data as part of machine learning operations.
Here, we highlight four key issues AI-based system providers (including those servicing the automated vehicle sector) must address to ensure GDPR compliance during the development stage.
Data controller or data processor?
An AI-based system provider acts as a data controller if it initiates the system development and builds the relevant database with data selected according to parameters it has independently set. When reusing data originally collected and supplied by a third party, the provider of the AI system as well as the third party usually operates as an independent controller with all corresponding liabilities.
The AI system provider acts as a processor when developing AI systems as part of a service for a car manufacturer that provides specific instructions regarding data sources and categories. If both the provider and the customer have common objectives — for example, for the provider to improve its system while also servicing the customer— they will likely be joint controllers.
When the AI system provider builds on its own initiative and chooses a dataset for the purpose of developing AI systems even where they are tailored to the needs of each of its customers, it will most likely be a controller. Using the same database for different customers tends to indicate the provider is a data controller in relation to the database's development. This does not exclude the possibility of the provider acting as a processor for certain processing activities on behalf of its customers.
Essentially, a case-by-case assessment will be required to determine the role of the AI system provider and whether and when it acts as a controller or processor.
Specified, explicit and legitimate purpose
Under the GDPR's purpose limitation principle, personal data should only be collected for specified, explicit and legitimate purposes and not further processed in a manner which is incompatible with those purposes. The legitimacy of processing activities therefore depends in part on whether the ultimate purpose of the AI system is known at the time it is developed.
When the ultimate purpose is known, the situation is fairly simple, however, this will not always be the case, particularly where the AI system can be used for a variety of purposes which may not be known at the time of development, for example with large language models. In this situation, the CNIL indicates that the purpose of the processing activities carried out at the development stage may be considered as sufficiently identified, specific and legitimate provided it relates cumulatively to a specific type of system and to technical functionalities and capacities which can be reasonably anticipated. If the AI system provider reuses data that it initially collected for another purpose, it must assess whether this further processing is compatible with the original purpose of the data collection.
As an illustration, the CNIL considers that the development of a computer vision model capable of detecting various objects, such as vehicles, pedestrians, street equipment or road signs, may qualify as having an identified and specific purpose.
Lawfulness of processing activities
Under the GDPR, personal data may only be processed where it satisfies one of the Article 6 lawful bases. In addition, processing of special (sensitive) data is prohibited unless one of the Article 9 conditions is met. Valid consent can be difficult to achieve in relation to training data although the more closed the system, the more achievable consent may be. Without it, an alternative lawful basis is required. This may be where the processing is required for the performance of a contract with the data subjects, or that the processing relates to a public interest mission, but more often than not, legitimate interest will be the most likely lawful basis for the processing. The data controller will, however not only need to identify its legitimate interest but also carry out a balancing test to ensure that its legitimate interests are not outweighed by the rights and interests of the data subjects.
When the AI system provider reuses a database obtained from a third party to create its own database, it must ensure that the creation or sharing of this database is not manifestly unlawful. The CNIL recommends obtaining specific information from the third party under contract, including regarding the source of the data, the context of data collection, the lawfulness of the initial processing, any relevant impact analysis, and the information provided to data subjects.
Data management and monitoring
When designing an AI system, it is crucial to adhere to the GDPR data protection principles, including the principle of data minimisation.
It is essential to clearly define the AI system's objective. This includes determining the use of the AI system, the expected outcomes, and the performance indicators. The system's usage context must be identified to determine priority information and exclude irrelevant contexts and data.
During collection, only relevant data should be selected for machine learning, applying to both raw data and associated metadata. The principle of minimisation requires careful consideration of the necessary personal data, taking into account the volume, categories, types, and sources of the data. Data should be selected strictly based on its usefulness and the potential impact on the rights and freedoms of the data subjects, including by carrying out a Data Protection Impact Assessment prior to the processing where required.
Privacy by design should be implemented to incorporate data protection from the outset. It is important to select the technique that best respects individuals' rights and freedoms, prioritising those requiring less personal data while achieving the desired objectives. Data cleaning should also be implemented to enhance data integrity and relevance. This involves correcting missing values, detecting outliers, correcting errors, eliminating duplicates, and removing unnecessary fields.
Anonymisation should also be considered to the extent the activity does not require personal data. The machine learning for an AI-based system for autonomous vehicles may involve the collection of a large volume of data to enable the vehicles to correctly visualise their environment and make decisions based on the different scenarios the vehicle may encounter. In this context, personal data such as pedestrian faces and vehicle number plates may be collected incidentally. As this will be irrelevant for the contemplated purpose, the provider should consider automating an anonymisation process of non-essential personal data images captured.
Regular database monitoring is crucial, as initially implemented measures may become obsolete. This involves comparing data with source data and reviewing data using trained staff.
Appropriate data retention periods must be set, with criteria defined based on past experience, IT development durations, and available resources. Data no longer needed for day-to-day tasks should be deleted unless necessary for performance verification or improvement.
Lastly, appropriate technical and organisational measures must be implemented to ensure an appropriate level of security. Robust encryption and authentication methods, protecting collected data with backup and logging, and securing the information system with authentication methods and staff training are some of the measures to be considered.
Privacy is essential
AI is revolutionising the automotive industry by enhancing vehicle performance, safety, user experience, and efficiency. However, addressing associated data privacy risks is crucial. Balancing innovation with privacy will be key to the sustainable use of AI in the automotive sector.