Future Lawyer: eDiscovery and Generative AI Today – Part 1
In eDiscovery, we’re used to working with technology to manage and interrogate large volumes of data efficiently, defensibly, and cost-effectively. When technology like Chat-GPT gains traction, our considerations are two-fold. We first need to understand how it might affect the types of data we collect and review for disclosure and, secondly, how it might change and hopefully improve the tools we currently use to carry out those tasks.
The public debate about Artificial Intelligence continues to swing between two extremes – either AI will replace everyone’s jobs, or AI is overrated, and nothing will ever change. The reality will likely rest between these extremes and vary by industry. In particular, the legal sector — a regulated profession traditionally seen as resistant to change — may seem safe from AI disruption. However, with such a significant leap forward in AI capabilities, it’s very likely that even the legal profession will need to adapt and work with the technology, not against it. In this series of posts, we’ll explore why this new breed of AI differs from what has come before it, how it could be applied to the eDiscovery space, and what the roadblocks to wider adoption might be.
Let’s start by defining some terms we’ll need to understand in more detail before examining how AI technology can impact eDiscovery. Whilst there is much jargon associated with AI (something it has in common with eDiscovery!), it’s important to understand how to talk about new types of AI and capabilities for these AI models.
Large Language Model (LLM)
LLMs are the next generation in AI technology. Built on neural networks that mimic the layers of neurons in the human brain, LLMs are large and powerful computer models initially trained to manage general natural language processing tasks. The LLM can then be subject to additional training and fine-tuning to handle specific tasks more efficiently and accurately. This fine-tuning often takes the form of “prompt engineering”: crafting the best context and instructions to the model to get the answer you need.
An LLM is trained on a large dataset with many parameters. You’ve probably already heard how many of these models have been trained on the data available for free on the internet. The computing power, data, and resources needed to develop and maintain an LLM can be enormous.
Generative AI
Generative AI is often mentioned in the same breath as LLMs, and while they aren’t the same, they can overlap. Generative AI refers to AI which can produce new content — whether this content is text, audio, video, or images. The technology predicts what words, phrases, pixels or sounds typically appear next in the sequence based on examples already learned. This “prediction” is the source of some (but not all) ethical concerns with generative AI, such as bias and accuracy levels. For example, generative AI trained on different data sets may give different answers and vary in accuracy depending on the training data. If a generative AI model has been trained on false data, then the model's responses will also be inaccurate because it doesn’t know any better. If the data set that a generative AI model is trained on is prejudiced or biased in some way, then the responses the model gives will perpetuate that bias.
So, what’s new?
We’ve had AI capabilities for many years in the guise of machine learning and natural language processing, and it’s become a part of everyday life. However, as computing power increases and the amount of data generated and stored by humans increases, the capabilities of AI also increase. In particular, the ability for AI to learn from data in an unsupervised way (i.e., without explicit labelling of data from humans) has increased, allowing for more varied use cases for AI. In particular, the move from classifying data to generating new data.
Source: Netflix
Utilising AI, Netflix has already made a name for itself in successfully profiling its users and suggesting content to viewers. AI has also helped Netflix create engrossing highlights, recaps, and trailers to increase viewership. They are now experimenting with a new compositing technique for filmmaking that relies on machine learning while actors are sandwiched between garish magenta lighting and a green screen. The leap from predicting your next Netflix show to creating a brand-new show based on previous content you’ve watched is a conceptual leap that changes how AI can impact entire industries.
Source: Chat-GPT
In addition, LLMs have created opportunities for everyone to interact with AI, not just data scientists. You don’t need to have any experience in AI or machine learning to ask Chat-GPT a question, and whilst your ability to interact with AI can be improved with training and practice, you don’t need to understand how the underlying LLM works to see benefits. And since LLMs are general purpose, you don’t need to build a new model every time you have a new task.
Chat-GPT, from OpenAI, is currently the most known LLM in the world. When it burst onto the scene in late 2022, it revolutionised how AI was perceived, inspired huge amounts of thought pieces, and demonstrated a potential to impact many industries. Chat-GPT is a specific implementation, designed for conversation, of a model called GPT-3; however, the model behind the scenes is evolving rapidly, and many of the errors which were being reported on the release of Chat-GPT (such as hallucinations or making up facts) have now been blocked, but not necessarily resolved.
Source: Google Bard
Other tech companies have also invested in LLMs. Google has a family of models called LamDA, which are specialised for conversation, such as their chatbot BARD. Another example is PaLM, which can handle complex learning and reasoning tasks. Meta also invested in LLM development, and their LLaMA model hit the news in March 2023 when it was leaked online. It’s now an open-source model where the source code is freely available to the public. But many other proprietary models exist, developed by companies and universities for commercial and research purposes.
Source: Google Bard
With these advances in AI capabilities, it’s inevitable that the legal industry, among others, will face some impact. The rest of this series will look at the possibilities of this technology when applied to eDiscovery; however, many other aspects of legal work may be transformed by AI. Legal research, drafting and templating memos and reports, admin tasks such as formatting and adherence to house styles, summarising regulations, or legislation, and many more tasks spark possibilities for innovation.
Chat-GPT revolutionised the public’s perception of AI by giving direct access to powerful and new technology — crucially without requiring any prior expertise in AI. The ability of LLMs to deal with language and take instruction from the user rather than delicate parameter-tuning in the background by an expert means that, theoretically, a lawyer can already experiment with using Chat-GPT for free online. However, issues such as data privacy, data security and protection of privileged information mean that you shouldn’t start uploading client data into Chat-GPT for testing! It’s also worth keeping in mind that Chat-GPT’s data set is not fully up to date, and you may get answers which are not valid today that may have been accurate in the past. Online courses on legal prompt design (or how lawyers can interact more effectively with LLMs) are already popping up; however, there is no substitute for practice and experimentation.
In our next articles, we look at how technology is used in eDiscovery today and where LLMs and generative AI could take us into the future.
Read Part 2 ›
Read Part 3 ›
Author: Rachel McAdams is a Senior Consultant at Sky Discovery in London.