2 of 7

1 March 2022

March - Health data – getting the right balance between innovation and data protection – 2 of 7 Insights

Diagnosed by big data

Victoria Hordern looks at the use of big data and AI in medical diagnostics in the context of data protection and AI regulation.


Victoria Hordern


Read More

In the last month or so, the UK government has announced that it will invest in new technologies and equipment to diagnose cancer quicker. The experience of the pandemic in the UK has led to a significant drop in cancer diagnoses and the government wants to place a renewed focus on innovative treatment and early diagnosis. Part of the government's announcement included a reference to the use of artificial intelligence and machine learning technologies to improve the assessment of cancer risk.

Technologies that assist with diagnosis

Using big data tools to assist with disease diagnosis is not new and for several years there have been discussions and investigations on how such techniques can improve patient outcomes. Much big data processing today will deploy AI and ML tools to help with analysis given the greater efficiencies these tools can deliver.  A number of organisations have been pioneering the use of these technologies to help healthcare professionals make accurate diagnoses. In doing so, big health data has been collected and processed in bulk as tools have been trained to recognise patterns and to flag correlations which can indicate a propensity for disease.

One of the significant advantages of AI systems is flagging disease risk early. For instance, currently in the UK, every mammogram taken to diagnose breast cancer is typically double-checked by radiologists, which is time-intensive and can lead to diagnosis delays. AI systems can help assess mammograms quickly and provide reports back to radiologists – providing the second pair of eyes which speeds up the process. Assessing big datasets in a health context (and obtaining data from populations around the world) is also particularly helpful with diagnosing rare or hard to diagnose diseases, potentially transforming patient life chances.

However, like any technology, the use of AI and ML has advantages and disadvantages. For instance, there is scope for bias in the data analysis and there are concerns around opacity. There are also different levels of reliance on AI and ML technologies – from computer-aided detection methods where the technology supports healthcare professionals, to tools that operate without direct clinical supervision.

Whatever the pros and cons of using AI and ML for diagnosis, in most instances, the analysis of datasets for medical diagnostic purposes will involve personal (health) data. While arguments may be made that the data is anonymous (including that synthetic datasets are anonymous data), the bar to indisputably proving that data which has been personal data is now effectively anonymised, remains high. In the context of datasets, the more detail about a medical condition and other identifiers within the dataset (ie geographic location, age range), the harder it will be to argue the data is not personal data.

Complying with the data protection principles

Any use of health datasets for diagnostic purposes needs to meet the requirements of data protection law. In the UK, as in the EU, data protection law is underpinned by principles. So any use of big health data for diagnostic purposes must still meet the principles of lawfulness, fairness, purpose limitation, transparency etc, set out under the GDPR (which, unless otherwise specified, we also take to include the UK GDPR).

In particular, any organisation seeking to use big health data needs to check how the data was collected originally, what individuals were told and whether there are any barriers to further use for big data diagnostic purposes. Originally, for instance, a medical image would usually have been collected about a patient's condition in order to treat that individual patient. So using that image for diagnostic purposes to benefit other patients, is a separate purpose. However, the GDPR recognises that further processing of personal data for scientific research purposes (so long as certain safeguards are in place) is not incompatible with the purpose limitation principle. Consequently, if a business is able to argue that the use of big health data analysis is for scientific research purposes and that purpose is to create a medical diagnostic model, then it should not fall foul of the purpose limitation principle (see here for more).

Interestingly, when in 2017 the UK ICO published an Undertaking following its investigation into the use of 1.6 million patient records shared by the Royal Free Hospital with DeepMind to support the development of the Streams application (a tool to aid with Acute Kidney Injury diagnosis), the ICO did not find this processing to contravene the purpose limitation principle under the (then applicable) Data Protection Act 1998. These 1.6 million records represented individuals who were existing patients of the Royal Free along with those who had presented for treatment in the previous five year period – in other words, these records were not limited only to patients requiring immediate healthcare.

The ICO did though consider that the processing of the 1.6 million records failed the data minimisation principle in this particular instance because the focus of the ICO's assessment was on the clinical safety testing of the Steams application. The ICO wasn't persuaded that this amount of data was necessary and proportionate to test the application, suggesting lower volumes of data could have been used. While the ICO also expressed concerns over whether it was necessary and proportionate for 1.6 million records to be used in the live and ongoing use of the Streams app, there does not appear to have been any further regulatory action on this point (although this data sharing has been back in the news recently due to the reported representative action filed against DeepMind).

One way to comply with the data minimisation principle is to use smaller datasets if they can produce equally effective AI models (although, given the risk of bias created by small datasets, this won't always be possible). In some cases however, rather than using thousands of medical images to create a composite image, fifty images may be sufficient to create an image to train an algorithm to a high enough level of efficiency and accuracy. Certainly, as technology develops there should be space for alternative approaches to big data processing where it's possible to achieve the same result building a resilient and effective model while not using significant volumes of data.

Even if an organisation can get comfortable on the purpose limitation and data minimisation principles, it must still ensure its use of health data in this context is fair and transparent to affected individuals as well as lawful under Article 6 and, importantly, Article 9 GDPR.

The GDPR distinguishes between processing of data for preventive or occupational medicine and medical diagnosis (referred to in Article 9(2)(h)), and processing data for scientific research (referred to in Article 9(2)(j)). However, the two lawful bases are closely connected. For instance, data can be used firstly for scientific research where the data is processed to create and train a ML model that can help diagnose diseases in other individuals. It should then be possible for this ML model built from the original health data to be used for medical diagnostic purposes for the benefit of other individuals.

Patient surveys broadly indicate that patients do not object to their health data being used for the purposes of scientific research to help improve our understanding of disease and to help others. However, organisations need to provide clear messaging to individuals about such use and ensure an accountable framework governs use of the data in a balanced and fair way.

Automated processing

Could reliance on big health data techniques for medical diagnosis engage Article 22 of the GDPR? Article 22 restricts the ability of organisations to make decisions based solely on the automated processing of personal data where that decision produces a legal effect or similarly significantly affects an individual. There are additional safeguards where such decisions are based on special category data including health data. Any such decisions involving health data are only lawful if the explicit consent of the individual is obtained or if the processing is necessary for reasons of substantial public interest. Article 22 therefore appears to limit the circumstances where solely automated decision-making using health data for medical diagnosis is lawful given a medical diagnosis is likely to be considered a significant effect on an individual.

The focus then becomes on whether the medical diagnosis provided through a big health data tool is a decision based 'solely' on automated processing. The more evidence that can be advanced that a medical professional analyses the output and weighs it with their own professional view before the decision is made, the easier it will be to argue that the decision is not based solely on automated processing. In other words, the input from human decision-making has to be meaningful and not simply a 'rubber-stamping' of a decision made by AI. Otherwise, if the decision is solely based on automated processing, it's highly likely the organisation will need explicit consent from the individual (in the UK there's no substantial public interest condition which would easily fit here).

Big health data diagnostic systems should include safety measures that guard against automation bias where healthcare professionals stop using their own expertise and judgment to interpret the results from an AI model. Moreover, where ML models become increasingly sophisticated, there is a danger that a human may not be able to properly review and interpret the output. Where this is the case, the system begins to move towards producing decisions which are solely automated and therefore within the remit of Article 22. Consequently, it's important that any big health data model used for medical diagnosis is structured so that it enhances rather than replaces human decision-making.

It's not just about data protection legislation

Any discussion on big health data must also consider the developments concerning AI legislation emerging from the EU as well as the UK's National AI Strategy. The European Commission produced a draft AI Regulation in April 2021 which is still being debated. In the April 2021 draft, certain AI systems are classified as high-risk and therefore subject to stricter requirements.  These include establishing a risk management system, data governance and management practices plus technical documentation and record keeping.

As currently drafted, medical devices and in vitro diagnostic medical devices (as those terms are defined under EU Regulation 2017/745 and EU Regulation 2017/746 respectively) used for diagnostic purposes where (a) an AI system associated with the device is intended to be used as a safety component of the device or is itself the device, and (b) where the device is required to undergo a third-party conformity assessment, are likely to be considered high-risk AI systems under the draft AI Regulation.  Businesses involved in designing AI tools for medical diagnosis should follow the development of the AI Regulation closely since such devices are likely to be impacted.

So far, there's no clear signal from the UK government that it will introduce specific AI legislation although the National AI Strategy does include a focus on exploiting AI in healthcare and research. For instance, £250m has been pledged to create the NHS AI Lab to provide help and guidance with, among other priorities, earlier cancer detection.

In the absence of imminent UK legislation, there are a number of other resources available. For instance, the Care Quality Commission (CQC) report of March 2020 'Using machine learning in diagnostic services' highlighted the need for greater guidance and infrastructure to support clinical validation of algorithms as well as clarity on how hospitals should implement ML devices within clinical pathways. Understandably, given the critical nature of health data to these developments, the CQC also encouraged the ICO to produce specific guidance to help manufacturers and care providers understand how data can be used in these diagnostic devices, in addition to the more general guidance the ICO has published on AI and data protection.  

Starting as you mean to go on

Given the challenging public health environment we are living with, technology offers a multitude of opportunities to support clinicians and help with earlier and accurate diagnoses to improve patient life chances. Any use of big health data techniques is likely to involve personal data triggering the requirements of the GDPR.

At the least, any organisation developing or using an AI diagnostic tool, should carry out a data protection impact assessment – a process set out under the GDPR. Since the potential for further regulation in this area (particularly from an EU perspective) is high, any big health data tools developed for diagnosis should bake in safeguards and controls, focusing on data protection by design and default as the building blocks for GDPR compliance.

Services and Groups Data & cyber Health Data

Back to

Global Data Hub

Go to Global Data Hub main hub