Moving Beyond The Linear Review Gap: Predictive Analytics And Predictive Coding Examined

Monday, August 30, 2010 - 01:00

Editor: Can you tell us a little about yourself and Recommind?

Carpenter: I started my career as a litigation attorney for a midsize firm in California, but within a couple of years, I went over to the vendor side. For about the last decade, I have worked for software and hardware companies. I've been with Recommind about four years. The company, which is ten years old, sprang out of academia. Its core technology looked at text and discerned various topics and concepts and then related them to topics and concepts in other pieces of content, irrespective of key words. We call this technology CORE, which stands for Context Optimized Relevancy Engine. Initially, the technology was used in the knowledge management arena, primarily for attorneys, accountants and researchers - people whose time was at a premium. However, in the last four years, we've gone headlong into eDiscovery and the regulatory compliance arena. These areas are essentially still using paper-based approaches, using paper workflows and outdated technology to address digital problems, essentially the same tools I was using in a law firm twelve years ago - which clearly is a mismatch. Our technology allows information to be dealt with in an automated fashion, very accurately, and, obviously, very quickly.

Editor: Recommind had an interesting survey this summer that showed the disconnect that exists between the IT and Legal Departments. Why is this disconnect happening?

Carpenter: At this point, we're talking primarily about eDiscovery, and we're talking almost exclusively, but not entirely, on the in-house side. To begin with, IT and legal speak different languages, and they have completely different mandates. IT's job is to make information available, keep the systems up and ensure that the systems don't go down. IT is also loathe to delete anything. Legal's job is to mitigate or minimize risk to the entity. Often, those mandates can be very much at odds. eDiscovery, which essentially is the pressure cooker under which both groups must operate, forces both groups to work together, often without a lot of additional budget or liaisons. So it's a situation that lends itself to a disconnect, which is what we found in our survey.

Editor: What are corporations doing about this disconnect?

Carpenter: To address this disconnect, corporations have to increase communication between these two groups. They also need to recognize that eDiscovery is not a flash in the pan issue. It is something that is here to stay, and it is dominated by digital information, not paper-based information. In many cases, if they can, corporations are hiring people who are fluent in both worlds and both languages, who are bringing both sides to the table when it comes to projects as they are being rolled out.

Editor: It sounds like things are changing rapidly in eDiscovery technology. For those companies who are struggling to keep up, where would you suggest they start?

Carpenter: There are a lot of different things that companies can do to begin to incorporate eDiscovery technology. Consider this analogy. If you have a dam that is breaking, you can address the problem in the short-term by patching the dam. However, it also might behoove you to figure out what broke the dam in the first place - perhaps even going to the source of the water to figure out how to address that part of the problem. So what these corporations need to do is to go to the root of the problem and determine when information is created, what information is created, who creates the information, how long it lasts, what they know about it, if they track it, if they tag it or categorize it. They need to do all these things to get a better handle on the information. The more that information is organized, the easier things will be to handle once an event occurs - and an event will occur. It is not an overnight fix. These companies will fall further and further behind unless they start recognizing this today.

Editor: So better information management will help reduce costs further down the line in litigation. What other ways are smart companies using technology to reduce costs?

Carpenter: Today, waiting weeks to get more insight into a situation is typically too late. Forward-looking companies are using technology not only for cost reduction, but also to get more control over the process. Anytime ESI leaves the company, the company loses custody, and if it loses custody, it loses control. Every separate step in the process injects risk, where all sorts of bad things can happen. So leading companies are reducing cost and risk by bringing more eDiscovery capabilities in-house - for example, bringing data preservation, ECA, collection and processing in-house, rather than sending ESI to outside vendors to be handled. There are two big types of in-house projects that are being done in this context. The first is the capability to do preservation and collection. Building technology in-house will allow these enterprises to quickly figure out what information they need to preserve and collect. They can conduct some sort of assessment early on, not waiting until the information has been preserved, collected, processed and then culled before conducting assessment. They do it right at the time when they are conducting their collection. They are then able to do all of the processing, culling and analytics in-house. This is the biggest area of growth in the industry because it is getting pretty far upstream. The more people get better insight up front and reduce the volume of information that needs to be dealt with further on down the road, the lower their costs will be - exponentially. The second type of project that in-house counsel are doing now is very sophisticated analysis of information from the very outset of a matter. So, to assist our in-house clients, we offer something to our customers called Predictive Analytics. This can be done literally immediately after collection, or it can be done concurrently with collection and preservation. We upload the data into sophisticated software that automatically identifies what information belongs to which topic groups. It is determined by topic, by people, by dates, by file type, by whatever the client needs. It is an automated process that prioritizes this information and tells in-house counsel what is in the data. The software doesn't know what is important, but once you tell it what is important, it will prioritize all of the information based on the client's input. You don't even need to look at the vast majority of the data. You can start to immediately hone in on what is the most important. In other words, you can do an automated first-pass review of information so that you don't have to send out 500 gigabytes or even a terabyte to outside counsel for review, you only have to send out the five gigs that are the most relevant. Then, if you need to send out more data as you go, that is fine - you are already on top of the substantive issues and evidence in the case. But you can start from the core nugget of what the case is about and then work your way beyond those parameters only as necessary. Hence, you can control costs and have far more predictability. This process is being used by large, forward-looking companies. It's not yet typical, but it's quickly gaining more acceptability and will soon be the de facto way eDiscovery is handled in-house.

Editor: Isn't it risky to automate part of the eDiscovery review? Don't lawyers need to review everything?

Carpenter: The simple answer is no, but you need the ability to do both.

Humans simply don't need to review everything. It's neither necessary nor does it address the root cause of the challenge, which is not being able to find the documents that are the most probative of whatever you are looking for. So, if you look at the various approaches, you could take the default approach, which is what most people are doing today - contract reviewers. However, contract reviewers aren't particularly accurate. They are accurate anywhere from 50 to 70 percent of the time. As an industry, we've already shown that technology is more accurate. If you look at off-shoring, that is not particularly helpful. In the first place, you are still using contract reviewers. Secondly, even if you get the cost down as close to zero as possible, software is always cheaper than people in this context. In addition, you have the added risk of sending data offshore, which has not always been the best approach. If you factor in how much data is growing these days, automation is going to be the key to meeting this challenge. Now, whether you automate one percent or 99 percent is up to you, but companies absolutely need that capability.

Some companies want to automate everything. Some companies want to automate only some things, and other companies don't want to automate anything when it comes to document review because they are not comfortable with it or because it's not the way they're used to working. The vast majority of corporations are saying that even if I don't want to use this technology today, I do want the ability to use it two, three, maybe five years from now. So, the buying decisions they're making are a little more forward looking. The IT people working with legal recognize that eDiscovery is not a flash in the pan issue. They are addressing it from the strategic level as a three-to-five-year solution, not as a 6-to-12-month solution. So they do want to get more forward looking, even if they don't necessarily want to use the full breadth of technology's capabilities today.

Editor: So traditional style review is being replaced by Predictive Coding. Is Predictive Coding defensible?

Carpenter: The default today is linear review, but Predictive Coding is actually more defensible than linear review. People who use linear review have this impression that it is infallible. But studies have shown, and law firms would tell you reluctantly and outside counsel know, that linear review is not the most accurate way of approaching the challenge. It happens to be the default, which means for a lot people, it's the devil we know. However, in most cases linear review tends to be 50-70 percent accurate.

In terms of the coding calls often made by each contract reviewer, software is already better than that. In fact, in our installations with our clients, Predictive Coding accuracy rates are well north of 90 percent. Defensibility is a relative thing. Linear review has been assumed to be perfect, but it never has been and it never will be. Predictive Coding is far more reasonable in cost and far more defensible than traditional linear review. The year 2010 has been a watershed year for Predictive Coding because people have started to recognize that fact, and to minimize costs, risk and time they've started to embrace it much more.

Editor: Business seems to dislike uncertainty more than it dislikes regulation.

Carpenter: You're right, and that's why there is so much pressure on firms to embrace new approaches. Inside counsel is saying that if you want my business, you must explore alternative pricing models. This is where the recession, in a perverse way, really has helped in-house counsel attain their objective of more accurate computer-assisted analysis with more predictable eDiscovery costs because they've been able to demand that their outside counsel pursue Predictive Coding.

Please email the interviewee at with questions about this interview.