Document Review 2.0: Leverage Technology For Faster And More Accurate Review

Friday, February 1, 2008 - 00:00

Craig Carpenter

Recommind, Inc.


With the "double whammy" of dramatic growth in electronic forms of data and increased pressure wrought by the revised Federal Rules of Civil Procedure, today's businesses face unprecedented cost and complexity in simply defending themselves in litigation, regardless of the merit of the underlying suit. From litigation preparedness to presentation at trial, and everywhere in between, solutions to every business's problems aboundor so it would seem. But with eDiscovery exacting a financial toll upwards of $2-$4 per document, costs can quickly get out of hand for even the wealthiest enterprise.

However, the simple fact is that the majority of costs along the eDiscovery spectrum - estimated to be 60-70% of the entire eDiscovery bill - are at one step: the document review phase. Some of this cost is unavoidable: attorneys, especially those with particularly specialized expertise, are expensive, so any amount of their time spent reviewing key documents can quickly add up. And in a typical document review scenario, lawyers (and everyone else involved) are forced to wade through countless irrelevant, unimportant or simply off-base documents in search of the ones they do care about. As with many other things, the solution to this incredibly costly problem lies in automation: the more lawyers can utilize software to help them automatically categorize documents along key parameters (like responsiveness, privilege, priority and relationship to an issue), the more efficient and accurate the entire review process will become. And this will benefit not only the client - with better service, lower legal bills and a better chance of winning the case - but the lawyer as well, with more efficiency, greater capacity and better work product.

The Digitization Of The Enterprise

In the "old days" of a largely paper-based working environment, the amount of information that could be created was limited - by space (e.g. file cabinets), time (e.g. to draft and edit a memo) and the relative paucity of duplicative documents, at least by today's standards. Then a little thing called the digital revolution came on the scene; information that had previously been created, edited, shared and stored on paper media was now created, edited, shared and stored on digital media. The key difference between these media lay in the variable cost of creating/editing/sharing/storing each piece of information: the digital revolution effectively reduced this variable cost to zero. The effect? The volume of information being created skyrocketed, and continues to skyrocket to this day. A discovery process which had once required review of perhaps 5 boxes of information now involved 100, 1,000, or even 10,000 times more information.

And The FRCP

After much debate and more than a few high-profile court cases (i.e. Zubulake, Morgan Stanley, et al), on December 1, 2006 the Federal Rules of Civil Procedure (FRCP) were revised to reflect the impact the advent and proliferation of electronic information had had on the discovery process. The changes were numerous, and at the risk of grossly oversimplifying their import they essentially dictated that parties to litigation must understand their data situation from the outset of the case, and would not be allowed to drag their feet or abuse the eDiscovery process.

Not surprisingly, the impact of this double dose of bad news was huge, and hit even the wealthiest, most sophisticated enterprises like a punch to the gut: not only would lawyers need to churn through more data ( lots more data) to find the relevant information they needed, but going forward they would often have less time in which to do so. To make matters worse, these changes happened so fast that the discovery process of old really hadn't changed to meet the new, eDiscovery world. Documentary review was - and largely still is - organized by custodian, with individual reviewers or review teams typically looking at the documents produced by a single custodian. But whereas the inefficiencies of the discovery system were minor in the paper-based world, these same inefficiencies were multiplied many times over in the world of eDiscovery and manifested themselves in several ways. First, because the number of irrelevant, unimportant documents was far higher the cost of the document review process itself went up dramatically - even though clients were essentially getting the exact same service they had received a decade before. Second, the time it would take to conduct a document review process would be significantly longer, but the revised FRCP would not allow the eDiscovery process to be extended accordingly. Third, with such huge volumes of data being reviewed under such tight timeframes, the quality of the review began to suffer with key documents being missed. And last, as they had to wade through mountains of irrelevant or unimportant data it was now far harder for attorneys to build their case (i.e. by locating corroborating documentation) quickly, a problem which was exacerbated by the revised FRCP's increased emphasis on the collaboration of the parties on eDiscovery matters from the very outset of the case . In sum, a new document review model and workflow were desperately needed to address the new reality of eDiscovery.

What Today's Document Review Process Should Look Like

The most important factors with any document review process are accuracy (finding the key documents), relevancy (putting the right documents in front of the right people) and efficiency (minimizing redundant work). And while review workflows will differ from firm to firm and case to case - some may involve only one or two "passes" at the majority of a document collection while other reviews may make numerous passes at all documents - these factors will consistently dictate whether a document review process will be effective or not. By honing in on the right documents, an accurate document review process gives lawyers and clients alike a much better chance of winning the case. Similarly, a relevant review process will have the key review team members (experts, partners, associates, paralegals) looking at the right documents in a timely fashion; this will help the attorneys understand their case more quickly, giving them a better chance of settling or winning the case, while also allowing for a highly efficient review process. And an efficient document review process benefits clients as their eDiscovery bills will be lower, and benefits lawyers as they are able to free up time for more work or clients, which translates into more billables and revenue.

How Technology Can Make Document Review Faster and More Accurate

In spite of its tremendous benefits and promise, today's technology does not and cannot take the place of review attorneys. What it can do, however, is automate much of the "heavy lifting" attorneys might otherwise need to do, give them deep insight into documents before they have ever been read and allow attorneys to "manipulate" even massive document collections quickly and with great accuracy. While these tools would have been quite helpful under the "old" document review process, in today's digital world with tens, hundreds and even thousands of gigabytes of information to churn through they are an absolute necessity.

The following technologies represent cutting-edge tools that are helping lawyers implement a highly effective document review process, regardless of the amount of data which must be reviewed and analyzed.

Automatic identification and extraction of key people, phrases and concepts

Highly sophisticated "entity extraction" software can provide key insight into a case - and documents that support or undermine it - before any documents have even been reviewed. Using patented machine learning technology, this software "reads" each document and pulls out the most relevant and important people (whether or not they are a custodian), phrases and concepts, and makes them visible in the review interface. From these key insights attorneys are able to accomplish several critical things, including learning exactly what a document collection (and therefore, often, a case) is all about, sampling or assigning documents based on key people, phrases or concepts, and removing huge swaths of documents due to their lack of relevance or responsiveness or their likely privileged status. Under the extremely tight timelines promulgated by the revised FRCP, knowing where one's data is before the first Meet and Confer Conference is important; knowing the strength or weakness of one's case based on that data right out of the gate is a critical strategic advantage.


This technology automatically categorizes or "clusters" documents according to myriad parameters, including document type, custodian, defendant, party, email sender/recipient/cc/bcc, attachment, document ID, and batch. This functionality is quite helpful in small cases and absolutely indispensible in larger cases, as it allows all documents to be organized along numerous useful parameters before review has even begun. For example, if one knows that all .exe files are not responsive one can automatically segregate these documents before review has begun, and simply ignore them in the review - saving countless hours of review time and dollars of client money. In addition, this technology allows review organizers to assign documents for review based on one or (more likely) several of these parameters en masse, thereby dramatically expediting the organization, setup and implementation of the review.

First Pass Review/Retraining

Every attorney enjoys finding a highly probative document; spending countless hours finding all the others like it is not as much fun. In a perfect world, once an initial document is found which is highly representative of something - whether that something is an issue, an element of an issue, responsiveness, relevance, priority or privilege - the attorney could simply hit a "find all like this" button and all similar documents would instantly appear. Believe it or not, such functionality is not only available but working, today, on live cases. All one has to do is find a few documents (5-10 will do) that are representative of an issue, privilege, priority, etc., tag them to the particular category, and hit the "retrain" button to begin the retraining process. At this point the software uses the representative documents as guidelines as it sifts through each document in the collection, returning all similar documents in a matter of a few seconds or minutes. At the same time, the system also tags every document in the entire collection with respect to each category that has been so retrained. This has two primary - and incredibly powerful - benefits: first, attorneys can find all key documents quickly, which has innumerable benefits throughout the review process, production and even trial; and second, the system organizes every document in the collection on all retrained parameters, each of which can then be batch assigned to the appropriate reviewer or review team, supporting a highly relevant, efficient and accurate review process.

Craig Carpenter is Vice President, eDiscovery Solutions at Recommind, Inc.

Please email the author at with questions about this article.