Editor: Let’s start off with a brief description of your professional backgrounds.
Volkert: I’m the executive director of Robert Half Legal and executive managing director for our eDiscovery Services practice. I graduated from the University of Miami School of Law and was a litigator before joining Robert Half in 1999, when I opened up operations in Miami. In 2005, I took over the global operations for Robert Half Legal as executive director: we have 25 offices in North America, seven international offices and more than 250 specialized legal recruiters in our various specialized staffing centers. I also oversee our e-discovery consulting practice, which has over 1,000 full-time consultants in 24 countries and subject-matter experts, one of whom is Joel, who heads up our subject-matter expert practice.
Wuesthoff: I’m the director of the day-to-day management of our e-discovery and records management solution in New York. I attended college in Montreal at McGill University, law school at Vermont Law School, and practiced law for about seven years before getting involved in information security, privacy, and data governance consulting. I’m an adjunct e-discovery professor at the University of Maine School of Law, and I teach an e-discovery project management course at Bryan University.
Editor: Can you provide some background information on predictive coding and perhaps shed some light on what the fuss is all about?
Volkert: Document review continues to be the costliest and most time-consuming aspect of e-discovery. As a result, more organizations are looking for creative approaches and innovative technologies to reduce the costs of handling enormous amounts of business records and electronic data. The question that comes to the forefront is this: can artificial intelligence (AI) replace first- or even second-tier review in the largest litigations in the near future? Whether AI has achieved the same or a higher level of reasoning and legal capability to stand in for lawyers and contract attorneys in a meaningful and defensible manner is a question we need to be asking, and the answer today is no. Computers and technology are not replacing lawyers; rather, as in many other professions, they’re beginning to perform certain tasks that lawyers have customarily handled manually.
While the proper application of predictive coding has been positive, making the process faster and more efficient, it certainly doesn’t replace eyeballs to documents and human review. What’s all the fuss about? The courts, defendants, plaintiffs and their counsel are simply demanding alternative methods to perform reasonable, efficient and defensible data searches in the face of inordinate and increasing volumes of electronic data, and predictive coding can assist in this quest. Predictive coding is commonly defined as a technology process using software that is applied to a data set to identify responsive documents. It utilizes a learning process that includes mathematical algorithms to find identical or similar data within a larger data population. Attorneys examine the results to identify the responsiveness in the subset of the data population.
There has been success in this area, but there have been a number of arguments against predictive coding. It’s not yet a standard methodology commonly used by counsel in legal or regulatory matters. Concerns over its acceptability and its defensibility within the national and global judiciary systems continue; there’s a significant challenge in having nonhuman decision-making accepted in legal or regulatory proceedings, a lack of understanding about how the technology works and also a lack of clarity about how predictive coding fits into the discovery process. Finally, the legal field has not been an early adopter of technology given the lack of case law or guidelines and simply the nature of the practice, so the widespread acceptance of any alternative approach to traditional review methodologies will take time. Few attorneys want to be the first to champion a new technology like predictive coding to the courts.
Editor: What are the risks and rewards of predictive coding?
Volkert: Some known risks include inconsistent interpretations of data reviewed for the benchmark subset, over-production of non-relevant data, and, most important, revealing confidential or potentially privileged data. Nevertheless, tests show the use of predictive coding technologies with the appropriate process and human oversight can increase the speed of identifying relevant documents and substantially reduce many risks associated with first-tier review, one area attorneys prefer to delegate. We don’t see it replacing more substantive, second-tier review.
We’re seeing increased demand for highly skilled doc-review attorneys when predictive coding has been used for the second-tier review. To maximize the results of applying predictive coding to e-discovery, the litigation team should be actively participating. Many prominent law firms are identifying senior lawyers as e-discovery or information lawyers. These individuals and third-party professionals, like Joel, who are highly experienced with the technology process, are being designated as champions, and many law firms and corporate clients are reaching out to them. In partnership, you have in-house counsel, law firms and subject-matter expert providers who can help work through this process to make sure that the best solution is put in place for a particular review – one size certainly doesn’t fit all when you look at document review, predictive coding and the human element.
Editor: How do you know when predictive coding is right for a particular matter?
Wuesthoff: There are two things that drive that decision: The nature of the data, both the volume and the type of data, and the particular engagement you’re involved in. Typically, large volumes of data that are text searchable are appropriate for predictive coding – images and spreadsheets not so much. So you’re looking at volumes of emails and text-based documents that supply a sufficient population of data to allow the algorithm and the machine learning to have enough information in the training process to develop appropriate models. The second piece is a term called “richness.” There should be enough not only in volume but in substantive or “hot” documents to allow the algorithm to set up a model. In other words, if you only have one document out of a million that is responsive, the partners have very little to work with in terms of training the machine. Volume plus richness becomes a very good argument for using predictive coding.
There are several questions lawyers should ask: To what extent can and does predictive coding solve the problem in front of you? You don’t want to force feed the solution into something that doesn’t deserve it – such as a case involving a small number of documents. Second, and this is a common theme raised by the courts in a recent decision, is there an auditable process that documents precisely what decisions were made, with supporting justifications, by parties overseeing the process so they can support that later on – not only internally but with respect to the courts and opposing party? Third, is there a good workflow to build around the engine? Again, the courts have not focused so much on the actual black box inner workings as on the process built around it. Fourth, will the technology require implementation throughout an enterprise or be used more selectively on a specific set of data? That’s a complicated undertaking requiring someone with a computer science background and integration experience, which is generally beyond most attorneys. Deploying the technology within a corporation or enterprise, which is what more of our clients are doing, is a complicated installation across different business units. What’s more likely is that people are using it for specific purposes and engagements. Finally, project management is a critical piece. If you don’t have the proper execution and monitoring to use these tools, you might as well go back to the old school way of doing it, which is much less efficient but at least has a workflow.
Editor: Can you provide insight into the various predictive coding technologies and the data collection process?
Wuesthoff: Let me focus on support vector machines, or SVMs, a technology that’s been used in multiple adjacent industries like bioinformatics and healthcare – where you’ve got large amounts of data you’re trying to parse through to identify trends or patterns, such as identifying prescription models for treating cancer patients. SVM does a great job of classifying data and creating a clear line between what is relevant and what is not, bringing to the fore documents that are more relevant than others. SVMs are syntactic, focusing on the structure of the sentence, number of words per document, or number of times the words appear in one document versus another. Recently, a similar technology was used to examine J.K. Rawlings’ new book, The Cuckoo's Calling, which she wrote under the pseudonym Robert Galbraith. The technology confirmed that Rawlings was the likely author by looking at her past writings. It observed that the writing style and word formats in the new work were very similar to those in the Harry Potter books.
Editor: Are there recent developments related to predictive coding that our readers should be aware of?
Wuesthoff: 2012 was the big year: Da Silva Moore was the first decision that came out, Global Aerospace was second, and then others followed. What was unique about these particular court rulings was that the marketplace and the legal industry were waiting for something to occur that would give judicial approval on the use of these technologies. It’s important to note that the judiciary has never really approved keyword searching, the de facto gold standard of search. There was some concern that these machines would take over the review from the humans.
The rulings were interesting because they were extraordinarily transparent. In order to get the judge and the opposing party to agree to them, the defense parties had to literally share the documents they were using to teach the machine to look for other hot documents similar to those they considered relevant. So, in predictive coding cases, the consensus has been, yes, we approve it, but it requires seasoned oversight and doesn’t remove the need for eyes on documents, and it needs to be carefully tested, sampled and monitored in order to be considered valid.
Editor: What impact has predictive coding had on the practice of law?
Wuesthoff: At this stage, only five to ten percent of mid-to-large litigation matters are using advanced or machine learning, but it certainly is forcing outside counsel and their clients to consider better ways to search through documents. Certain types of computer modeling and predictive analytics as a broader category have been used since World War II in identifying different ways to target planes and missiles. It was used in the Manhattan project and for weather forecasting in the 1950s and is now used by Pandora and Amazon to customize personalized shopping, but the law has not been an early adapter.
Volkert: The ever-growing data populations have been centered around large corporations, but now we’re seeing a whole host of small, rapidly growing companies involved with data management and governance. Even small litigations involve voluminous amounts of data. By adapting to changing technologies, lawyers have an opportunity to review larger sets of client data not normally accessible due to the size limitations and excessive accumulation of irrelevant data. Predictive coding may mark the beginning of the end of electronic documents to review, but not for human review overall; it’s simply a change in how the work will be done.
Editor: What does the future hold for predictive coding?
Wuesthoff: We don’t know what role these tools will play in the future. Remember, we’re talking about tools that have been used mostly in adjacent industries to classify data. Five to ten years from now, who knows what the particular technology will be, but from a legal perspective, what’s important is having the right balance between technology and human oversight and a process that’s consistent, reliable and reproducible. No matter what technology we use, there’s always going to be a guiding hand, and the best models involve outside counsel, the client, and experienced professionals with a track record of dealing with advanced technology.