Getting It Right: Training And Certification In Predictive Coding

Saturday, September 22, 2012 - 10:02

The following discussion is based on a webinar entitled “Getting it Right the First Time: Training and Certification in Predictive Coding,” which was presented in the spring and sponsored by Recommind, Inc. The presenters were Howard Sklar, Senior Counsel, Recommind and Michael Potters, CEO/Managing Partner, The Glenmont Group.

Sklar: It is difficult to grasp the seemingly infinite amount of data in today’s business environment. Not long ago, megabytes and then gigabytes were the primary data quantities, but volumes grew so quickly that we skipped the terabyte age and went directly to petabytes (one quadrillion bytes).

At a recent industry conference, one company reported having 400 petabytes of “wild west data,” i.e., data that was preserved without knowing what it is. Google’s CEO notes that every couple of days, we generate as much information as we did during the entirety of human history until 2003.

For the legal industry, document review has become the tail that wags the dog of litigation. This presentation will address the issues and explain how counsel can use predictive coding to manage the review process accurately and efficiently.

Mike, please start the discussion by telling us about trends you are observing in the recruiting industry and what is driving them.

Potters: Based on the calls we are receiving, most recruiting within the legal technology and information governance space involves candidates with an understanding of predictive coding. Vendors may be looking to fill positions on the sales and delivery side; law firms may need qualified professionals for project management, litigation support or consulting; and corporations may want to add this expertise to their internal resources.

An organization that has expert resources to leverage predictive coding technology enjoys the following benefits:

  1. Discovery processes that involve less redundancy and, therefore, are more cost-efficient;
  2. The ability to quickly and accurately zoom in on the smoking-gun issue, particularly given litigation or investigative timelines;
  3. Efficient allocation of resources leading to better and more consistent results;
  4. Avoidance of mistakes that can result in multimillion dollar sanctions;
  5. Better synergies between IT and Legal – specifically, we are seeing an increase in recruiting calls from companies seeking people who can work in both departments; and 
  6. Better business intelligence overall and substantial advantages during litigation.

The need for digital data management skills will only increase over time. With email, mobile devices, social networking sites and thumb drives, data is everywhere, and organizations increasingly will seek to hire people who clearly understand where data resides and how to access it.

Sklar: Corporate legal departments are under great pressure to reduce litigation costs, with discovery being the most expensive piece. Traditional document review processes are not scalable; further, notwithstanding efforts to reduce labor costs, including outsourcing, today’s data volumes render human review impossible at any cost.

These issues surrounding document review and managing litigation costs involve not just data volume but also the complexity of data collection. Today, corporate servers are joined by the cloud, social media and iPhones that have 32/64 GB of data and more computing power than the original space shuttle.

Companies that employ a younger workforce report that some employees have never accessed their corporate email, but rather do all their communicating via messaging and Facebook. From an in-house collection perspective, IT has become a consumer-oriented organization within corporations.

Potters: These newer methods of communication are very different in substance and nature, with most being a collection of snippets of text. Understanding this data is challenging and requires a fresh and informed perspective; thus, companies are well advised to prepare for these developments because they represent the inevitable future.

Such preparation involves intelligent hiring strategies. For instance, you wouldn’t hire an auto mechanic to repair an airplane because even a good mechanic needs prior training to understand the vehicle. The same logic holds true for the information management industry, including discovery processes. Recruiting people who are not trained in the latest, most effective technology will waste time and money because employees must learn on the company’s dime. So there is a very real value in hiring a trained person.

Of course, selecting the right technology also is critical. Howard, can you briefly summarize Recommind’s predictive coding technology and process?

Sklar: The process starts with statistical sampling on the document corpus to determine baseline responsiveness and to generate a seed set of relevant documents. Concepts within the documents and “yes or no” relevance decisions by human reviewers of that seed set are fed back into the system, and new batches are returned. Repeating this cycle essentially teaches the system how to select highly relevant documents and prioritize them for review. When the system stops finding relevant documents, and that drop-off is usually dramatic, we can be confident in the findings. At this point, statistical sampling may be applied again to a random sample of the rejected documents to further ensure comprehensive results.

The development of predictive coding was a natural step in the evolution of document review, offering substantial process and efficiency improvements versus linear review. When review sets reached the millions of documents, the traditional review process became less effective. It was prone to generating false positives, and it offered no guarantee of consistency. As part of the linear review process, keyword search was the presumptive gold standard for identifying relevant documents and was thought to yield 75 percent accuracy; however, the true recall rate was closer to 20 percent.

Further along the evolutionary scale, predictive coding involves conceptual searches, which are keyword agnostic and can be applied across custodians to cull documents based on relevance and meaning. While the math behind the technology is complex, the idea is simple: humans teach the system to find relevant documents quickly, accurately and efficiently. We know of a situation in which a firm was able to use predictive coding to reduce linear review costs originally estimated at $1.5 million by two-thirds, and the benefits of time savings cannot be overstated, both in satisfying court timelines and in meeting regulatory deadlines during an investigation.

Potters: So it’s clear that predictive coding requires both humans and technology, and hiring people who understand the process is a real key to success. My impression is that traditional linear review requires less of a knowledge base than sheer numbers of reviewers. Nonlinear review, with its emphasis on priorities and relevance, demands a smaller but more knowledgeable staff, which, in turn, can help reduce costs. Are these impressions accurate?

Sklar: Yes. Certainly, knowledge can create efficiencies and better results in any environment, and it is an essential element of using predictive coding. Given that review costs are largely related to labor, it stands to reason that hiring fewer well-trained people will produce better results at lower cost.

For example, a recent company investigation by the Metropolitan Police in London involved the need to review 300 million emails. Using traditional linear processes, the review would have required every staff member in Scotland Yard to work 365 days per year for four years, doing nothing but reading emails for ten hours each day. When you imagine the labor costs alone, it becomes clear that humans need technology.

However, it is also important to remember that predictive coding technology cannot function without humans. Further, predictive coding enables quality control/assurance assessments because the machine suggests relevant documents and then provides a percentage of certainty in its finding. This functionality allows an organization to assess what went wrong in a situation where the system indicated high relevance but the reviewer decided otherwise. At all points, a trained workforce is essential to the predictive coding process.

Potters: Yes, and that statement is borne out by my experience on the staffing side. As mentioned above, most staffing requests from corporations are for attorneys or IT professionals with an advanced knowledge of predictive coding. While these people command higher salaries, they also enable companies to leverage technology and achieve greater cost savings overall. This expertise becomes a key asset for these organizations.

Sklar: To that end, Recommind offers predictive coding training and certification seminars to educate review attorneys about predictive coding and its efficiencies. Mike, would you please wrap up the discussion by talking about the rewards of a trained and certified workforce.

Potters: Ironically for my industry, one major benefit of organically training and certifying employees in the use of predictive coding is that it eliminates the need for and cost of recruiters, particularly given that hiring outside the firm usually involves a 15 to 20 percent upcharge in salary. Internal training also adds value by producing staff members that can mentor one another, creating better synergies and more stability within a department.

Finally, to summarize the benefits discussed above, use of predictive coding offers the following rewards:

  1. Up to 75 percent cost reduction;
  2. Over 50 percent time reduction;
  3. Better intelligence from a better discovery process;
  4. Internal development of talent and more stable departments; 
  5. Minimized recruitment costs; and
  6. Better synergy between IT and Legal.

Sklar: I will highlight the time-savings benefit, because it is often overlooked in our haste to identify cost savings. The time factor is becoming more and more important, particularly in light of legislation like the Dodd-Frank whistleblower provision, with its 120-day reporting window for reporting a whistleblower problem in-house while retaining one’s place in line for bringing the matter to the SEC. Thus, companies have exactly 119 days to complete reviews that may involve millions of documents. Amazingly, I have seen a company complete a review of two million documents in just 87 days using Recommind’s predictive coding technology.

In closing, it is worth mentioning that a universal technology law states that better input produces better output. Thus, it is critical to have people knowledgeable about the case doing the substantive legal work and people knowledgeable about the system working with the technology. When you put those two together, you can achieve the best results and greatest efficiencies in document review.

Prior to joining Recommind, Howard Sklar was global trade and anti-corruption strategist with Hewlett-Packard Co. At HP, he was in charge of the company's global anti-corruption compliance program. He also was counsel to the global trade division, giving relevant business units advice on compliance with U.S. sanctions. Before going in-house, Mr. Sklar served 12 years as a prosecutor and regulator, first as an Assistant District Attorney in Bronx County, NY, where he developed a specialty in computer crime investigation, and then as a senior enforcement attorney in the Branch of Internet Enforcement at the SEC. Mr. Sklar has lectured on the FCPA both nationally and internationally.

Michael Potters founded the Glenmont Group in 2001 as a boutique executive search firm specializing in legal technology and information governance positions across all four industry verticals: corporate, law firm, consultancy and legal vendor. He has served on the advisory boards of The Association of Litigation Support Professionals, The Masters Conference, The IQPC, AIIM and ARMA and is a frequent presenter at Georgetown, IQPC, ARMA & Computer Forensics Institute events. Mr. Potters has published articles for various legal technology publications and is a regular contributor to SC; the Infosecurity magazine and the legal technology blog eDiscoverying.

Please email the presenters at or with questions about this webinar.