Evidence Mounting In The Case For Predictive Coding

The Editor interviews William Tolson, Senior Product Manager, Recommind.

Please email the interviewee at bill.tolson@recommind.com with questions about this interview.


Recommind, an OpenText Company



Editor: Bill, you’ve recently joined Recommind. Please tell us about your background and why you joined.

Tolson: I’ve been in the high-tech computer storage market for over 20 years, having worked with companies like Hewlett-Packard, StorageTek and Hitachi Data Systems. I focused on computer mass storage products and on storage solution software.

In 2001, I wrote the business plan around email archiving for StorageTek, which was well-known for its big tape-backup systems and hard-disk mass storage business. We wanted to create storage solutions to leverage that disk capacity; however, the main driver of this effort was regulatory compliance. At the time, the SEC and the NASD were issuing additional rules pertaining to data retention, such as SEC Rule 17a for broker-dealers. At the time, newspaper headlines were reporting major troubles at companies such as Enron, Arthur Andersen and WorldCom, which highlighted the need to make records archiving a priority. And  even more interesting, there were legal drivers, such as within the Federal Rules of Civil Procedure, which really got people thinking about electronically stored data.

As a result, I decided to specialize in data archiving and the legal aspects of discovery. I spent four years consulting and helping very large customers with their retention and e-discovery process.

I joined Recommind because of its leading position in the e-discovery industry and, specifically, in Predictive Coding, which automates and speeds up the document review process.

Editor: Predictive Coding is indeed a hot topic. How do you see it affecting corporations?

Tolson: Today, companies of all sizes have large data sets consisting of millions of electronic documents. The traditional review process for privilege and relevance requires lawyers to read each page, which becomes prohibitively expensive at $50 - $150 per billable hour. The latest RAND Institute report suggested that 73 percent of every dollar spent on e-discovery went to review costs.

A Predictive Coding system automates the biggest part of that process by eliminating the need for attorneys to review all ten million pages in a data set. The system is trained to recognize specific types of content and make decisions that reduce data sets to 15 or 20 percent of their former volume, thereby producing targeted and relevant data sets for attorney review. Obviously, this translates to huge cost savings.

Editor: I’ve read that some organizations are afraid of Predictive Coding because it’s characterized as a “black box” technology. Is that a fair perception?

Tolson: No. Vendors use the term black box to create fear and uncertainty in the market. Black box refers to hardware or solutions that function unseen behind the scenes. Actually, black box solutions have existed for hundreds of years, such as traffic lights and spam filters. Therefore, our competitors’ use of the term black box can be misleading.

Recommind’s Predictive Coding solution uses well-known mathematical algorithms and machine-learning techniques that we developed and that are completely defensible in court. While computers are trained to recognize specific concepts and content to make relevance decisions, the process is always managed and reviewed by human case experts. When you consider some of the true subtleties within the Predictive Coding system, such as ergative or linguistic training, use of the term black box simply becomes a fear tactic.

Editor: We understand that Predictive Coding is lowering costs in real cases. Could you provide more detail on the return on investment (“ROI”) you’re seeing?

Tolson: Cost reduction is a key goal in adopting Predictive Coding, and companies view the ROI calculation as a measurement of projected savings and return on investment (ROI). The ROI compares what your costs are currently in the discovery review process, what you spent to implement a new solution and what the savings would be over traditional solutions.

The ROI formula is a ratio. To calculate the numerator, determine the cost of a traditional e-discovery process (from start to finish) and then subtract the new cost of completing that same process using a Predictive Coding system. From this difference, subtract the investment made in implementing a Predictive Coding system. Finally, divide the resulting number by the investment made in implementing a Predictive Coding system.



The actual, big-dollar savings are achieved because far less time and money are spent on traditional manual document review; the usual range of cost reduction is 60-80 percent. These cost benefits are accompanied by a system that yields an accuracy rate of 90-95 percent, after completing the process of sampling, quality-control checks and all iterative training processes required to train the system and obtain the desired result.

Editor: Do you agree with statements that Predictive Coding is good only for very large cases, not for smaller matters involving a few custodians and relatively small data sets?

Tolson: I don’t agree because the comparative 60-80 percent savings we just discussed will always be there no matter the case sizes. If the demand for manual document review is reduced from 100 percent to 20 percent, then cost savings and ROI will be achieved.

Naturally, these results are magnified for very large corporations, which may respond to hundreds of e-discovery requests a year. In situations where a company buys and installs a Predictive Coding solution and uses it on an ongoing basis, we’ve seen ROIs of over 1,000 percent with our Axcelerate on-premise solution.

For smaller companies with just a few e-discovery cases per year, it may make sense to use Recommind’s on-demand service, Axcelerate on Demand, which offers the same basic capabilities in a SaaS-type product to be used on an as-needed basis.

Editor: What’s the difference between your Axcelerate on-premise and Axcelerate On-Demand solutions?

Tolson: They both offer our Predictive Coding capabilities. Companies make an investment in buying and installing our on-premise solution to maximize the benefits of ownership. This option works for any size company that handles a large number of litigations involving e-discovery per year.

Axcelerate On-Demand offers the same Predictive Coding solution but is purchased from Recommind on an as-needed basis. A company served with a discovery request can arrange to send data to us for privilege and relevance review. We receive the data, run it through our iterative process and then send back a responsive data set and a privileged data set.

Editor: Can these solutions be used in early case assessments (“ECAs”) as well?

Tolson: Yes. In the case of ECA, the goal is not to produce a set of responsive documents for opposing counsel but rather to look at company data internally. Many attorneys are not fully aware of the strategic value of this ECA-related capability. Data mining and analytics, for example, can be used in anticipation of an expected lawsuit to help attorneys determine the strength of the case and decide whether to settle a weak case or fight a strong one.

The Axcelerate eDiscovery solution allows analysis of data in very interesting ways, using what we call our CORE  platform. This includes search capability that can search for concepts, phrases and, yes, keywords throughout the enterprise’s data stores and then return results sets that are conceptually relevant. For example, the system conceptually understands that a Camaro is an automobile and that the concept of “automobile” also relates to Fiats and Ford trucks, so the resulting data sets are more comprehensive. This idea extends to more complex matters, such as investigations of misconduct or fraud. Thus, the benefits of Predictive Coding extend beyond external discovery requests to assisting with strategic internal strategy decisions.

Editor: So why is adoption still lagging?

Tolson: Predictive Coding is a relatively new technology that is still on an education and adoption curve for many in the legal industry, especially with attorneys who are less enamored with technology and need more education and proof before adopting it. In recent years, we’ve seen several very large cases with high-profile judges who accepted Predictive Coding and suggested it as the review technology. Attorneys are beginning to see that this technology is defensible in court – there are now hundreds of cases of precedent – and to acknowledge the obvious time and cost benefits plus the strategic uses we just discussed.

Finally, competitors who don’t have Predictive Coding solutions are working hard to perpetuate fear, uncertainty and doubt (FUD) in spite of the technology’s five-year track record of success in court and in producing dramatic time and cost savings.

Editor: What else can corporate counsel do to help lower costs of e-discovery?

Tolson: Ten years ago, the whole corporate records management industry centered on managing paper and determining what qualifies as a record for a given organization (unselected documents simply were discarded).

Today, the vast majority of corporate information is electronic. It’s created and consumed in microseconds, and volumes now reach into the terabytes, as opposed to mere megabytes and gigabytes not long ago. I advise corporate counsel to start managing all data in their enterprise – not just what is internally deemed to be a record – because the court system, for example, considers all correspondence, including instant messages, email, and drafts of work documents, to be potentially responsive in an e-discovery case.

It’s really a concrete wall, and corporations need to start thinking in terms of information governance rather than its subset: records management. Parties in litigation don’t care whether a document is categorized as a “record"; they simply want all responsive information. Companies need to manage all electronic data because it all relates directly to e-discovery costs.

Editor: Please talk about how getting rid of information in a defensible manner can help with e-discovery.

Tolson: The industry is starting to use the term “defensible disposal,” which refers to managing all electronic information and having automated processes in place so you are not keeping 10 years' worth of emails just because you might refer to them someday. If it doesn’t have value to the company and it’s not related to regulatory compliance or litigation, then you don’t have to keep it. If you don’t keep it, then it’s not discoverable and will never require costly review. Right or wrong, the courts do not expect organizations to keep everything forever.

The next step toward defensible disposal is to put in place automated processes that discard information in a defensible manner that doesn’t raise suspicion in court with a judge or opposing counsel, who might think that the data was deleted to get rid of a “smoking gun.” This is one of the biggest issues for our industry, and automated categorization using predictive technology can help companies manage and delete information when it’s no longer useful.

There'’s a great example of this involving DuPont, which did a study in the late 90’s of nine legal cases in which the company had been involved. They reviewed over 75 million pages of content, found that 11 million of those were responsive, and completed their discovery obligations by turning that subset over to the other side. Upon further investigation, DuPont determined that 50 percent of the original 75 million documents were past their retention period, which translated to unnecessary review costs of $12 million. This example illustrates how information governance practices can really affect e-discovery costs.

So it really comes down to companies needing to get serious about information governance – especially corporate counsel – because it affects the bottom line and annual budgets.

Special Sections: 
Lawyers for Civil Justice
Section Subtopic: 
Service Providers
Other Topics: