11 July 2018

eDiscovery Innovations: Continuous Active Learning

Keep up or lose out

The world of eDiscovery is ever growing, changing and innovating. It can be difficult to keep up. But making the most of innovations is essential, not optional. The use of technology in eDiscovery is vital for successful case management, development of settlement strategies and reducing costs.

Technology assisted review, or predictive coding as it is often called is becoming more and more advanced. It will, or should, in the near future become a part of every document review exercise in some capacity.

Continuous Active Learning

One of the most talked about innovations in technology assisted review is continuous active learning (CAL). Understanding how this technology can be applied to document review exercises in disputes or investigations is fundamental to getting to the most relevant documents at the earliest stage whilst also keeping costs down.

"Applying approaches like Continuous Active Learning means you can get to the answers faster and define your case strategy earlier on every project." - David Nichols, OpenText

Put simply, CAL identifies and prioritises the documents it believes are most likely to be relevant to the matter being investigated. The system then creates a prioritised queue for review. As that manual review exercise proceeds, the review queue is continuously reshuffled and reprioritised throughout the process as the system learns more about what is relevant and adjusts its predictions accordingly.

As a result, by using CAL those documents reviewed first are the ones which are most likely to be relevant. The documents remaining in the review pool are those most likely to be irrelevant and can be reviewed by a more junior reviewer or excluded from the review altogether.

How to use Continuous Active Learning – size doesn’t matter

What's your priority?

The least controversial use of CAL is solely prioritising documents for manual review. The system, based on previous decisions made by human reviewers, identifies the documents it considers most likely to be relevant and pushes them to the front of the review queue. The system continues to learn throughout the manual review and reprioritises documents enabling the identification of documents most relevant to the case at the earliest opportunity. Prioritising can be used on document review pools of any size and there is no reason not to prioritise, unless you like surprises.

To err is human

CAL can also be used to quality check the decisions made by human reviewers. Often teams of paralegals spend weeks on repetitive document review exercises. Could they have missed a hot document or neglected to redact privileged information? Human error is unavoidable. CAL can be used to highlight documents where it predicts the tagging has been incorrectly applied, allowing for re-review and the chance to correct any errors before disclosing or producing documents.

Above all, discard the irrelevant

Perhaps the most controversial use of this technology is to exclude documents from the manual review altogether such that they are never seen by human eyes. This is most often used when the document set is so large that a manual review would be disproportionate or impossible in the given timescales. Understandably, naturally risk-averse lawyers may shy away from the thought of not reviewing all potentially relevant documents. But if used appropriately, this approach can be defensible and proportionate to satisfy the obligation of conducting a reasonable search for documents as required by the Civil Procedure Rules. It is not necessary to identify 100% of all relevant documents within the party's control. Some stones can be left unturned.

Ideally, before the manual human review begins, one subject matter expert (SME) should review a control set of documents, for example, a random statistical sample of 2,000 documents. Based on the SME's decisions, the system then estimates the total number of documents which are likely to be relevant. This estimate can be used to manage resource and also as a reference point later in the review to establish when it might be appropriate to stop reviewing documents.

The system learns from the tagging completed during the control set review by the SME and prioritises documents to the human reviewers on this basis. Once the number of relevant documents identified by human reviewers falls to a justifiably low percentage, a decision can be taken about whether to cease the manual review exercise. At that point a quality control exercise should always be conducted for defensibility, alongside testing of the system.

This strategy reduces the time and associated cost of reviewing irrelevant documents and has been shown to be at least as accurate (if not more so) than a full manual review exercise.

One important point to note is that the use of this method must be explained to, and preferably agreed with, the other side at an early stage. Furthermore, it is very important to have an eDiscovery expert on board to provide guidance and to assist with explaining and justifying the strategy to the other side and potentially to the court.

Technology should be assisting your review, so don’t be afraid to use it!