Future Lawyer: eDiscovery and Generative AI Today – Part 3

We previously discussed the use of technology in eDiscovery and investigations and the potential for generative AI and Large Language Models (LLMs) to enhance this process. Despite the promising technical capabilities of AI, lawyers must consider other factors before deciding whether to incorporate certain technology into their work. The four key principles of using innovative technology in eDiscovery are defensibility, proportionality, cost, and usability.

Is AI defensible?

Using machine learning today for document review was approved by courts in the UK in Pyrrho Investments Ltd v MWB Property Ltd [2016] EWHC 256 and in the Federal Court of Australia (Murphy J) VID 513/2015. While approval from the courts is a crucial factor in the growth of technology-assisted review (or TAR), the development of best practice guidelines and increasingly sophisticated models has ensured that results can be validated effectively. Most TAR software will incorporate the validation step, which will calculate the likelihood that the model has missed relevant documents, and the results of the validation step are often shared amongst all involved parties for transparency.

In the UK, the use of technology is also explicitly encouraged in Practice Direction 57AD, Paragraph 9.6, which instructs parties to seek to agree on the use of "software or analytical tools, including technology-assisted review software and techniques". So, how could we validate LLMs for document review? The good news is that the statistical validation techniques for existing machine learning approaches could be used as effectively for any other method of separating relevant and irrelevant documents via keyword search, machine learning algorithms, or purely manual review. The principle is still the same: can you statistically show that the chance of relevant documents being left in the unreviewed population is negligible?

Validating the output of LLMs for other tasks may be a more manual process. If an AI tool creates, for example, a chronology of key events surrounding the issues in dispute, then documentation and evidence around those events need to be checked. Firstly, to ensure that the event did happen and secondly, that the evidence supports the version of the event supplied by the AI tool. Other validation checks must also be carried out to ensure that certain key events aren’t omitted in the chronology because the AI tool didn’t realise their importance in context. Legal teams are used to carrying out these kinds of checks on draft work products and involving AI would be a method for speeding up this type of work rather than replacing the entirety of the human effort required.

Is AI proportionate?

The use of technology in eDiscovery is often to make the whole exercise proportionate in the context of the claim by speeding up the review of documents or reducing the number of documents that require human review in the first place. Current implementations of AI in Continuous Active Learning (CAL) or “predictive coding” can make an eDiscovery exercise more proportionate by de-prioritising and ultimately discarding documents that are unlikely to be relevant – reducing human time and effort to review them. If new AI models can do the same task more effectively or accurately, the impact should be the same — reducing the disclosure exercise's cost and time burdens. This makes use of AI, where available, a key tool in proportionality. However, if the requirements to set up, train, and interpret an AI tool are overly difficult or time-consuming, you could diminish cost and time savings by investing more in the technology. It will be important for AI tools in eDiscovery to be easily set up and configured quickly and for eDiscovery providers to have the right expertise to manage and advise on the process.

Is AI usable and cost-effective?

Whilst there are many ways to access LLM technology for free (Chat-GPT, Google Bard etc.), the open access and free versions of AI are not typically suitable for legal applications. There are many troubling issues, starting with outdated training data where a model may not be trained on current data, so its output is outdated or incorrect. There are also concerns that these open-access platforms have been trained without due consideration for potential bias in the model or personal data privacy.

One of the most pressing issues for implementation in legal workflows is the protection of client and internal data which caused problems for Samsung, whose proprietary and confidential data was leaked after being used with Chat-GPT. Privacy and security are critical elements for eDiscovery, as confidential and privileged data is routinely handled throughout the disclosure process. To protect sensitive and personal data, organisations and eDiscovery service providers may need to either develop their own in-house LLMs trained on their own data or subscribe to enterprise solutions that safeguard and ringfence data. Lawyers working on eDiscovery must understand the risks of using open-access tools on their own initiative. Information security and governance investment will also be needed to track what data is fed into AI solutions and how it’s used across clients and matters. Transparency between parties may also be necessary — a disclosing party may require that their confidential data is only used with AI technology under certain security restrictions to prevent that data from being disseminated more widely or incorporated into AI training.

Recognising that some expertise is still required to use AI tools most effectively is important. The skill of providing prompts to LLMs is in its infancy, but even after a few minutes of practising with open-access tools reveals that the way that questions and tasks are framed can affect the tool's output significantly. Lawyers will need to be able to create their prompts most effectively so that the output they receive is in the correct format and style, and they’ll need training and experience to learn how to interact with the AI. It may be that the prompts used to identify documents become disclosable themselves.

Additionally, the impact of national and international efforts to regulate AI, including generative AI and LLMs, may mean that AI can only be used in certain ways and impose rules around how transparent parties need to be about their use of AI.

For now, at least, lawyers may not be able to embrace new forms of AI for eDiscovery fully. But that doesn’t mean using generative AI is not in the immediate future. eDiscovery software solutions like Relativity have already built out ethical and secure implementations of LLMs. This may lead to more effective use of AI technology during document review and add new avenues for understanding and collating data sets across different data sources and types.

Technology in eDiscovery has enabled us to automate or alleviate the burden of manual, repetitive tasks like document review. Still, each advance in technology often requires an industry-wide acceptance of best practices, standards for validation, and data security measures. Work product that uses AI — in all industries — will be under high scrutiny for the foreseeable future, as it should be. Our aim at Sky Discovery will always be to strike the all-important balance between people and technology.

We would appreciate hearing your thoughts on this series. Please share relevant links or ask questions to keep the conversation going.

To learn more about how Sky Discovery uses technology, AI, and machine learning to help lawyers focus on the law and their clients, visit our website, or contact our team:

UK & Europe: solutions@skydiscovery.co.uk
Australia & Asia Pacific: solutions@skydiscovery.com.au

Back to the Index ›

Author: Rachel McAdams is a Senior Consultant at Sky Discovery in London.

Future Lawyer: eDiscovery and Generative AI Today – Part 3

Is AI defensible?

Is AI proportionate?

Is AI usable and cost-effective?

Related Solutions