Predictive Coding = Great E-Discovery Cost And Time Savings

Wednesday, November 16, 2011 - 17:55

The Editor interviews David J. Laing, Partner in the Washington, DC office of Baker & McKenzie LLP.

Editor: Please describe your background and practice areas. 

Laing: After graduation from law school, I joined the  Antitrust Division of the Department of Justice, where I investigated, litigated and prosecuted antitrust cases. I then moved to Baker & McKenzie doing the same type of work on the other side of the table. I focus my practice on civil and criminal government antitrust investigations and competition law matters, and the related commercial litigation that often follows an antitrust investigation. 

Editor: The cost of commercial litigation is increasing dramatically as a result of ballooning volumes of electronically stored information (ESI). Is this a fait accompli that corporations simply need to accept?

Laing: I don’t think it is a fait accompli. Though the amount of ESI is increasing dramatically each year, the technology to handle the discovery and management of ESI is improving at an even faster pace. The past six or seven years have seen some very drastic reductions in the cost of e-discovery attributable to advances in technology and competition among vendors. I find that one of the most exciting technologies for e-discovery, in terms of increased speed and reduced costs, is “predictive coding,” “computer-assisted coding” or other similar terms.    

Editor: Please share with us your experience with predictive coding technology.

Laing: Predictive coding is one of the areas that promise the greatest potential savings for e-discovery, in both government investigations and civil litigation. Predictive coding uses software to evaluate the probable responsiveness of documents. 

It uses a limited number of senior attorneys familiar with a matter to review a representative statistical sample of the documents. The predictive coding software then applies the results of that statistical sample to the entire database.  Predictive coding provides a way to prioritize documents for review.  

We have experience using that technology in different document production situations, and because we’ve been pleased with the results, we believe that we’ll be using it more extensively. The government antitrust investigations on which we work frequently involve the review of huge quantities of papers and ESI, often in the millions of documents, which must be completed within a very tight time frame. Often you’re trying to review and produce hundreds of gigabytes of data in less than two months.

For a real-life example, let’s consider an antitrust investigation by the Justice Department in a merger context on which we recently worked.  In that merger investigation, we used traditional document review and predictive coding to review nearly 500 gigs of data in a little more than six weeks. The DOJ was completely satisfied with the response and raised no questions about it.  The predictive coding application was Epiq System’s IQ Review, which is powered by Equivio technology. 

Editor: How much data was involved in the example?

Laing: The total universe for review that was finally loaded after the initial scrubbing and deduplication was just under 500 gigs, and we ended up producing a little under 200 gigs of data.

Editor: What was the alternative to using predictive coding to process the documents involved in the example?

Laing: It’s the standard process of manual review with attorneys looking at every document. Even with manual review, there are some shortcuts to focus reviewers’ attention on documents more likely to be responsive. They primarily involve keyword searches. However, using this traditional approach, the only way you can provide a recommendation to a client to sign under penalty of perjury a declaration that the document production is complete is if an attorney puts an eye on each document.

Editor: What are the benefits of the Epiq IQ Review/Equivio technology over the legacy keyword approach?

Laing: Just understanding how the predictive coding technology works shows the advantages. Keep in mind that this is a technology that was originally developed by government agencies, predominantly national security agencies, to quickly go through huge amounts of electronic data, primarily email communications, to identify whether communications containing certain patterns, phrases and topics should receive individual review. This technology is now in commercial use.  

After being trained by experts in the facts of a particular case, the technology will sort documents by percentiles of probable relevance. You end up with stacks of documents. Some may have been assigned a 90 percent probability of being relevant, others will be assigned decreasing orders of probability of relevance or responsiveness. 

Editor: How do you go about using predictive coding? What would be the attorneys’ role?

Laing: The process of “training” or “teaching” the technology to recognize responsive patterns starts with one or perhaps two attorneys who are very knowledgeable about the matter, the way the documents have been created, the way they were  stored,  and the requirements for the production. This senior attorney, often called the “expert” or the “sampler,” will start the training process by feeding into the technology samples of documents initially selected by the expert as being responsive. The expert will then review additional samples selected by the software, going through 40, 50 or 60 iterations of selected sample documents, and indicating which documents are responsive and why they are responsive.  This process can take up to 15 or so hours of senior attorney time.  

Once the technology is sufficiently “trained” in this manner, the technology goes through the remainder of the database and assigns a percentage of probable responsiveness rating to all the documents. Based on samples, it looks for different types of documents, different formats of documents, and different word patterns, that aren’t in the documents identified as relevant. 

Then, working with a team of document reviewers for standard, manual review of documents, you establish standards that you will not be reviewing substantively anything, for example, with less than 50 percent responsiveness. The documents above that level of responsiveness will then be given an eye-on manual review. 

With respect to the bottom 50 percent, we strongly recommend that there be quality control review. This might involve someone reviewing manually every thousandth document or other random sample of documents to confirm that what the technology has selected as nonresponsive is indeed nonresponsive. This means you will be reviewing only 1/1000 or less of the documents of a substantial portion of your entire document set. Documents that indicate a less than 50 percent possibility of relevance may constitute 70 percent or more of the documents in the data set. The review team may be providing an individual review of 30 percent or less of the total document set.  

Editor: What kind of results did you obtain in the example you are using; and how does that compare with human review of every document?

Laing: One of the reasons I’m discussing this example is that initially we decided that we were not going to apply the predictive coding technology, and part of that had to do with cost, and part of that had to do with the way the clients wanted to handle discussions with the Justice Department about search methodologies. They did not want to get into a lengthy explanation to DOJ of this relatively new technology.  

Soon after we started the review using a group of almost 70 attorneys doing standard manual review, it became clear that we could not complete the document production within the schedule required, even by doubling our staff of contract review attorneys. We applied predictive coding technology to the remainder of the data set, which was a little more than 35 percent of the remaining documents. Using this technology we were able to complete 35 percent of the document review on time whereas the initial 65 percent done with manual review took about five weeks. Also, by reducing the cost of contract attorneys, the remaining 35 percent of the document production cost was less than 10 percent of total document production cost.

Editor: In your opinion, is predictive coding technology defensible?

Laing: Yes, I think it is defensible, but there has to be some manual review of the documents for quality control to assure that those documents the technology indicates are nonresponsive are indeed nonresponsive.

Editor: Is this the death of the billable hour in document review?

Laing: No. I mentioned earlier the important roles that lawyers play throughout the predictive coding technology process. Even for the documents that the technology indicates are likely nonresponsive, some quality control by manual review should occur.  And for the documents that are likely responsive, someone still needs to review the documents, understand their significance, provide notes or commentary and organize the documents to comprehend how they affect the investigation or litigation. For the initial document review to determine responsiveness, document review time may be reduced by as much as 50 percent or more.  

Editor: Recently, in Law Technology News, Judge Andrew Peck published an article in which he encouraged the use of predictive coding technology as a replacement for keyword searching, the traditional technique used for culling document collection in e-discovery. How significant is the comment from Judge Peck, and what are the ramifications for the adoption of predictive coding in the industry?

Laing: It’s significant because to my knowledge it is the first explicit written acceptance by a federal judge of the benefits of this technology. Judge Peck’s article is not a judicial decision, however, and it is likely that there will continue to be resistance, by government agencies and litigants, until there are judicial decisions that clearly endorse the use of the technology. 

Editor: What best practices do you employ when producing to the DOJ, the FTC or other federal regulators?

Laing: Our primary best practice pointer is that some manual review is still required of the documents the technology identifies as likely not responsive – the bottom 50 to 70 percent of documents. Although those appear highly unlikely to have relevance, you still must apply some quality control to them. 

Another pointer applies to the meet-and-confer requirements in Rule 26(f) or its state analogs. At that time you should alert the other side that you will be using predictive coding technology, advise them about how it will be applied and get their approval for the technology to be applied. This no different from the current practice of discussing the use of keywords and deduplication at meet-and-confers. There have been instances of which I’m aware in civil litigation in which this technology has been applied but was not discussed at the Rule 26 meet-and-confer. This can create great problems in defending the document production if the production is ever questioned and then has to be justified to a magistrate judge or a district court judge.

Editor: How have the regulators’ attitudes toward this technology evolved in the last five years?

Laing: It was nearly unknown in the legal industry five years ago. Now many of the government agencies in Washington have an initial awareness of the technology, and some apparently still have misgivings. I am aware of this technology having been applied in some antitrust investigations in addition to its use in the example referred to in this interview. Because it still may take months for a government regulator to accept its use, we have concluded that application of this technology is a work-product decision that does not need to be disclosed to the agencies.  In the end, you must have a defensible production, including quality control, if the method of production is ever questioned. Within the last 30 days, I participated in a webinar with a representative from the General Counsel’s Office of the FTC and from DOJ’s Antitrust Division E-Discovery Technology Committee. Although they recognized the growing acceptance of the technology, the government, like many private litigants, is waiting for a clear judicial signal that this technology is both accurate and complete.

Please email the interviewee at with questions about this interview.