Towards big data and AI in health: the lawful basis for using patient data to build digital health products

In-depth analysis

A significant ingredient for the future of innovation is data, specifically big data. Furthermore, as machine learning and artificial intelligence (AI) techniques continue to improve, gathering and analysing data and applying information learned from such analysis will become increasingly important for innovating companies and organisations.

Healthcare is a sector that is already seeing application of these technologies. Technology companies are increasingly turning to healthcare as a sector into which they expect to expand using big data and AI as powerful tools to deliver products that are more accurate than any human physician, particularly in diagnostics, as well as to discover new drugs that would otherwise take years to find. Both governments and healthcare companies are realising the importance of these technologies in developing healthcare products for the future.

The creation of a European Data Space, which includes the healthcare sector, is set as one of the priorities of the European Commission for 2019-2025. The ambition is to enable a decentralised digital health infrastructure of data across Member States within the European Union, and to facilitate innovation and research through a cloud regime that allows for equal data sharing and analysis. As a part of this objective, several Member States have also signed a declaration to gather one million sequenced genomes across Europe by the end of 2022, which has become known as the Beyond 1 Million Genomes project. Healthcare companies are similarly turning towards accumulating data through private and public channels to improve their research and development. For example, some companies have started collaborations with third parties, such as data aggregators, to collect more data, or to contribute with their own sourced data.

These and related topics were recently in the spotlight in the UK as the government announced the General Practice Data for Planning and Research (GPDPR) programme in the summer of 2021. As part of the GPDPR programme, NHS Digital planned to collect and centralise patient data from GP practices. The collected data was intended to be transferred to third parties, including research organisations and healthcare companies, to be used for improving health and care, or research to develop diagnostic tools and treatments for serious illnesses. The programme was ultimately "deferred" after a public outcry about the commercialisation of individuals' personal data.

Nevertheless, the use of patient data in the UK for research purposes is not a new phenomenon. In 2015 the NHS provided the Google-owned tech company DeepMind with medical records of 1.6 million patients in the UK to help developing AI technology that could detect acute kidney disease. However, the data sharing agreement between the NHS and DeepMind has been subject to public controversies. The Information Commissioner’s Officer (ICO) found on 3 July 2017, in a non-binding opinion concluding their investigation, that the responsible NHS organisation had failed to comply with the Data Protection Act 1998 in the UK when the data was shared with DeepMind. More recently, a representative action was filed, also against DeepMind, for allegedly breaching data protection laws. Thus, while there are significant potential benefits to be gained from sourcing and analysing big health data, there are also significant legal implications which need to be considered, including data privacy, patients' rights and transparency obligations.

There have been other uses of big health data in the UK with more success. Notably, as part of a collaboration between NHS England and Genomics England in 2013, the so-called 100,000 Genomes Project was initiated to analyse whole-genome sequences from around 85,000 NHS patients affected by a rare disease or cancer. The project relied on broadly obtaining consent from participating patients for use of their data, which was subsequently shared with researchers (including for-profit) to improve knowledge of the causes, treatment, and care of diseases. The project sets an example to show that it is possible for governments to partner with patients and companies for use of big health data within the bounds of privacy laws. While consent may not be a practical approach for all big health data uses, there is no doubt that complying with the law has to be the way going forward.

Data privacy: the legal basis for collection and processing

While the UK is no longer a member of the EU, the UK has adopted the substance of the EU's General Data Protection Regulation 2016/679 (GDPR) provisions into its national regime, the UK Data Protection Act 2018 (UK GDPR), only adapting certain aspects to reflect the UK's new legal status as outside the EU. This article therefore focuses on the GDPR, with references to divergences in UK GDPR.

The GDPR provides that collection and processing of personal data, which includes health data, requires a lawful basis. Article 6(1) of the GDPR provides six different bases for processing personal data, including:

consent
processing where necessary for the performance of a task carried out in the public interest
processing where necessary for the purpose of a legitimate interest pursued by the controller or a third party.

Processing special categories of personal data, including more sensitive data such as genetic data, biometric data or patient data, is generally prohibited pursuant to Article 9(1) of the GDPR, unless one of the additional and more stricter forms of legal basis set out in paragraph 9(2) are satisfied. These include:

explicit consent of the personal data for one or more specified purposes, except where Union or Member State law prohibits such consent to be given
processing of data where necessary for the purpose of scientific research, which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject.

Where patient data is gathered from sources originally intended solely for use in treating the patient and not for use in big data analyses or developing new technologies, finding the patient and obtaining their explicit consent to those future uses are not straightforward exercises. This might explain why the UK Government chose to use an opt-out rather than an opt-in system for their GPDPR programme, but also why public opinion was not behind the project.

Where explicit consent is sought for these projects in advance, the precise future use for the data gathered may not be known at the time when consent is given. Recital 33 to the GDPR recognises these difficulties, and states that individuals should be allowed to give their consent to certain areas of scientific research in keeping with recognised ethical standards for scientific research. This is also reflected in Article 9(j) of the GDPR which offers "scientific research" as a separate legal basis for processing patient data. There is no precise definition of "scientific research" in the GDPR Articles. Recital 159 to the GDPR, and similarly the UK GDPR, lists however the following as "scientific research purposes":

technological development and demonstration
fundamental research
applied research
privately funded research
studies conducted in the public interest in the area of public health.

Schedule 1, Section 4(c) of the UK GDPR goes one step further in requiring that the scientific research must always be carried out "in the public interest", but without defining what research conducted "in the public interest" means. The UK Government, in its proposals to reform UK data protection law, has also stated that its is considering introducing a clearer definition of "scientific research", although it is not specified how this will differ from Recital 159 as reiterated above.

While the (non-legally binding) Recital 159 encourages a broad interpretation of scientific research, which extends to private research, the European Data Protection Supervisor (EDPS), the independent authority responsible for data protection regulatory oversight within EU institutions, has taken a more conservative line. The EDPS considers that:

for-profit commercial entities can engage in scientific research under the GDPR framework
for processing to be carried out in the "public interest" (which is relevant for Article 6(1)(e) of the GDPR, and Schedule 1, Section 4(c) of the UK GDPR), this implies that there is a "pressing social need" as opposed to largely private or commercial advantages.

Virtually all research projects pursued by private companies have commercial gain as an aim, but does this mean that such research projects are not considered to be in the "public interest"? Recital 159 indicates that "privately funded research" still counts as "scientific research", as reiterated above, but that is not the same as to say that such research is in the "public interest" per se. If it is not, then this is a disadvantage dealt to private companies which increases the potential value in the EU of the European Health Data Space. This could also compel more companies to turn to "broad consent" based processing, where individuals are asked to consent to use of their personal data for future (unspecified) scientific research purposes.

The European Data Protection Board (EDPB) has recently provided comments on the consistent application of the GDPR in health research. The EDPB was asked to comment on whether the concept of "broad consent" could apply to the processing of special categories of personal data for scientific research purposes. The Board deferred to provide detailed guidance at this stage, but emphasised that adequate safeguards should be in place to ensure transparency of the processing during the research project and to ensure that the requirements on specificity of consent are met as soon as reasonably possible.

In their comments, the EDPB further highlighted that further processing of previously collected health data for scientific research purposes must comply with, in addition to Article 59 of the GDPR, the presumption of compatible under Article 5(1)(b) of the GDPR. The original purpose of the collected data matters in this regard. As is stated in Recital 50, there should be a "link" between the purposes of the original processing and those purposes of the intended further processing. For example, if the controller relies on an individual's consent for particular research purposes, other and future research purposes must also match with those originally. The EDPB again refrained from providing any concrete guidance on this point, which is instead to be expected in its upcoming guidelines.

The EDPB were due to publish their guidelines on processing personal data for scientific research purposes in 2021. These have not yet been published, but are expected for 2022.

Other requirements

Whatever definition is applied to scientific research, appropriate safeguards under Article 89 of the GDPR, such as pseudonymisation, must also be considered when processing data on this basis. Even after patient data has been sourced in compliance with applicable privacy regulations, there are many other complex laws to keep in mind. These include the rights that patients continue to maintain with respect to their data, requirements on duration of storage and security measures.

Conclusion

As data is becoming an increasingly valuable resource in healthcare research and development, industry participants will need to think through the legal justifications for their collecting / having and using patient data in large quantities to develop the new digital health technologies based on big data and AI. The recent events in the UK with both private and public projects running into trouble illustrate that it is necessary to properly consider the patients' perspective on the use of their data. These projects must be set up in such a way that respects privacy concerns and rights of patients.

The 100,000 Genomes Project has shown that there is a willingness of individuals to provide their data for the public good where this is supported by a legal basis that is sufficient for the processing and which is communicated to patients in a transparent fashion. Although the project relied on obtaining individual consent, which may not be feasible for other projects at an even larger scale, it has been looked at as an example of good practice for future research and work that relies on use of patient data.

It is to be expected that there will be an increasing degree of collaboration between public and private data channels because of the potential for public gain from data, but also the economic value derived from building sophisticated products that it enables. Personal data is sometimes said to be "the new gold" for this reason. Where that new gold drives healthcare products using big data and AI in decisions, the legal basis for the use of that data should be the first consideration of many under the data privacy requirements that must be met.

Services and Groups Data & cyber