This article was published as part of Equivio's "Predictive Coding Minus the Hype" educational series.
“What is really astonishing is how quickly predictive coding is gaining acceptance. Only a couple of years ago, everyone was saying ‘it’s great, but it will never be broadly used.’ How things have changed.”
— Bennett B. Borden, Chair, Information Governance and eDiscovery Group, Drinker Biddle & Reath LLP
When I was a boy growing up on a prairie farm, my father taught me that cutting the first round of a hay field was the most important because every pass after that would follow the same path, and imperfections would only grow with each successive round. These errant wobbles and warps would make it harder to later bale the hay, wasting time and fuel.
Besides, the neighbors were watching.
I suppose my dad had learned these techniques from his father and then perfected them on mile after mile around our fields. As a practitioner of farming, he had put in the time and learned how to use the tools and technology at his disposal most effectively.
Technology, on its own, does nothing. It is only when technology is put to work in a context that its value is realized. Technology truly comes alive in the hands of practitioners who have a specific problem to solve. People with a problem to solve – especially problems that have deadlines and budgets – refine the technology in the foundry of real-world struggle. Practitioners develop techniques, shortcuts and workarounds that take advantage of the technology’s strengths and that sidestep its weaknesses.
This is exactly what is happening in the world of predictive coding today. Practitioners of predictive coding, i.e., the lawyers, technologists, paralegals and others who use this technology every day, are developing a book of practical expertise on putting predictive coding to work.
And these practitioners have been on an accelerated path. Although the science behind predictive coding can be measured in decades, the first federal case in the U.S. substantively addressing predictive coding only appeared in 2012. In an incredibly short period of time for the legal world, predictive coding has gone from being cautiously viewed as something we will do in some uncertain future (like fly in our cars) to something that most e-discovery practitioners are either using or evaluating. It also appears that e-discovery service providers and law firms are accelerating their use of the technology. A 2012 survey by analyst firm eDJ Group found, for example, that only 33 percent of practitioners had used predictive coding (Q1 2013 Predictive Coding Survey). One year later, the survey found that only about 20 percent were not using or planning to use predictive coding at all.
Will 2014 be the tipping point for predictive coding and its mainstream, systematic use by most firms? It seems likely. The same survey found that nearly 50 percent of practitioners already consider predictive coding to be a core part of their e-discovery workflow for certain kinds of matters.
So, what have practitioners learned about using predictive coding in this relatively short time? A lot. In this paper we share what we have learned from working with our clients, conducting research into predictive coding, and talking with other subject matter experts.
Although there is no universal "right" size or kind of case for predictive coding, the pure economic benefits of reducing review costs with predictive coding will be more pronounced in cases with greater volumes of information. Indeed, very large cases are often what drive many firms to look at predictive coding in the first place. Although overwhelming volume may be what drives firms to predictive coding initially, it is not what keeps them coming back. In fact, it is often the more strategic benefits that cement predictive coding as part of their e-discovery toolbox.
In theory, predictive coding can be used to support any case that involves discovery of a lot of text-based information. In reality, like other tools in a toolbox, predictive coding technology excels at solving certain kinds of problems. For example, traditional, manual linear review processes may be more cost effective for cases with smaller pools of potentially responsive information. Some practitioners use 10,000 documents as a rule of thumb for the threshold at which predictive coding becomes more cost effective and efficient, but the absolute number is less important than the overall context of the case, which includes schedule, resources, budgets and the richness of the corpus.
Carpenters do not start out building houses – they start by building birdhouses and stepstools, then work their way up to more complicated structures that require a blend of professional skill, experience, and creativity. Similarly, predictive coding practitioners are not born – they must learn their trade by practicing their trade. As such, practitioners getting started with predictive coding should start with a matter that enables that learning process to happen productively.
For example, we have seen practitioners use predictive coding on matters that had already been resolved, which provided the incredibly valuable benefit of enabling them to validate and cross-check their process and results with the results of their traditional linear review process. This also helped them build support for predictive coding with clients and with attorneys inside the firm.
Other factors to consider when getting started with predictive coding include:
“There is an art to this. This is not rocket science. Predictive analytics has been around for years, but it is relatively new to the science of information retrieval. There is an art to interpreting the results, just like a doctor interpreting the science.”
— Tom Groom, VP and Sr. Discovery Engineer, D4 LLC
When we are sitting in the doctor's office wearing only a backless green gown and our socks, we like to believe that our doctor is some kind of supercomputer who gathers the data on our symptoms and then calculates a perfectly logical, binary diagnosis – the only possible diagnosis. This is not reality. Like diagnosing and treating medical conditions, there is both an art and a science involved in predictive coding.
The science part has been around for decades, but it has been encoded and productized for e-discovery relatively recently. It is the practitioners – the predictive coding "doctors" in the emergency room of law firms and e-discovery providers across the country – who now need to develop and apply their knowledge and experience to this well-established science and make it art.
The art of predictive coding lies in several critical areas, including:
In large cases it used to be customary for e-discovery teams to work for months before substantially involving the lawyer in charge. In a world of predictive coding, this approach is no longer desirable. Predictive coding changes e-discovery strategy.
Predictive coding can help attorneys establish the relative merits of the other side's claims – as well as the strength of their own position – nearer the beginning of the e-discovery process, rather than after a months-long document review cycle. This capability should change the way that firms tackle e-discovery and change the makeup of the teams involved in the e-discovery process. Specifically, it means that predictive coding teams should have much greater – and earlier – participation from a senior attorney who shares responsibility for litigation strategy and who understands the legal issues.
Predictive coding teams often find that, during the process of training the software for a specific matter, the software trains the team as much as the team trains the software. In other words, as the software starts to “understand” the types of information the team is looking for, it will begin to suggest topically relevant but unanticipated documents. With well-designed software and workflow, this is happening before document review has even begun. Having a senior attorney involved in the training process enables a faster and more organic process where he/she can react and respond to the responsive documents and issues, focus the team and generate strategic insight weeks earlier than traditional approaches.
A representative from the client should also be on the team. This should be someone who is familiar with the organization and its relevant business activities and practices – someone who can guide the team on what the client views as relevant. Practitioners who are actually doing the coding and working with the system cannot work in a vacuum. Rather, they need to understand what the experts are looking for and what they care about.
This team should be in place right from the beginning of the predictive coding process, especially given how critical the training process is to a supervised learning system. Spending the time upfront to get the training process right will pay off exponentially downstream in the form of better results gained more quickly. Building consensus and understanding across the team right from the beginning is essential.
Experienced practitioners bring predictive coding technology into the e-discovery process at different stages, depending on the needs of the case. Learning how and when to employ the technology is a critical part of making predictive coding work. Insight that practitioners have developed in this area include:
In the Sturm und Drang of litigation, it is easy to miss the big picture. Litigation is like firefighting in that the goal (if not the method) is simple: put out the fire. Bring all your tools, your energy and your resources to bear on the problem because the fire will continue to burn until you put it out. However, your client is not in the firefighting business. They produce auto parts, market medical devices or manage money. As a business, if they even think about fires at all, they are mostly thinking about how to prevent them and how to contain and limit the damage if and when the next fire starts.
At the end of the e-discovery process, e-discovery practitioners often understand the organization better than it understands itself. E-discovery reveals the strengths and weaknesses of a client’s existing information governance (IG) program. It can reveal, for example, that the organization has little idea what information it has, where that information is stored or even if that information has business value.
Human-based classification and management methods alone do not solve the information governance problem for most organizations. There is simply too much information.
Today, we have the opportunity to apply predictive coding software to the information governance problem. In the e-discovery context, predictive coding helps us find the right documents, to separate wheat from chaff, in a highly automated and efficient manner. Information governance needs this capability. In fact, in large organizations it is difficult to see how information governance can be made real without this capability. As such, there is a tremendous opportunity for predictive coding practitioners to bring their knowledge of predictive coding to bear on the information governance problem. Some examples of how predictive coding can support information governance include:
Predictive coding is a powerful tool, especially in the hands of experienced practitioners who understand the art and science of predictive coding. This entails both a practical understanding that enables efficient and defensible workflows, as well as a strategic understanding that informs case strategy. Key insights that practitioners have gained in putting predictive coding to work include:
©2013 ViaLumina LLC. (“the authors”). All rights reserved. This publication may not be reproduced or distributed without the author’s prior permission. The information contained in this publication has been obtained from sources the authors believe to be reliable. The authors disclaim all warranties as to the completeness, adequacy, or accuracy of such information and shall have no liability for errors, omissions, or inadequacies herein. The opinions expressed herein are subject to change without notice. Although the authors may include a discussion of legal issues, the authors do not provide legal advice or services, and their research should not be used or construed as such. This work should be cited as: Barclay T. Blair, “Predictive Coding: Making It Work,” December 2013, ViaLumina LLC.
Barclay T. Blair is an advisor to Fortune 500 companies, software and hardware vendors, and government institutions, and is an author, speaker, and internationally recognized authority on information governance. He has led several high-profile consulting engagements at the world’s leading institutions to help them globally transform the way they manage information. He is the president and founder of ViaLumina.
For more information, please email the author at firstname.lastname@example.org or equivio at email@example.com.