Machines Versus Humans In E-Discovery - Strengths And Weaknesses

Monday, February 28, 2011 - 01:00
Erik Laykin

Erik Laykin

Editor: You make the point that combinations of machines can reduce the reliance on human review of electronic documents and offer promise in a variety of situations.

Laykin: The word "reduce" is important. It is necessary for users of e-discovery services to keep their expectations in line with reality. Machines, computers, software, networks and the cloud can all assist and help in the e-discovery process and reduce human interaction with individual documents, but cannot eliminate it altogether. The question is, how far can you reasonably go with leveraging technology to arrive at a defensible, reasonable and appropriate e-discovery examination of electronic documents.

There are numerous technologies in play at present, including technologies that have yet to make it into the commercial space. This is an evolving area, and as processing power increases, as storage increases, as network availability increases we have seen that users of e-discovery services now have access to a far larger array of applications, systems and technologies than they ever had in the past.

If you are to be called before a judge or a magistrate and you are required to explain why your review of a document population reached a particular result, it's incumbent upon the parties to be able to explain why. As a result, I'm a fan of technologies that are either open source or clearly defined in their documentation in terms of what they can and cannot do.

Editor: There seems to be a race between what machines can do and the explosion in the number and types of documents they must cope with.

Laykin: Yes, the traditional notion of a document has evolved and now includes multiple sources and types including scanned paper, computer "office" documents, email, instant messaging, voice mail, structured data from databases and transactional systems, web content, cell phone and PDA content, social media postings, log files from routers and other tracking systems etc.

Editor: What can be done in the litigation arena to contain costs?

Laykin: There is movement within the judiciary to recognize the importance of the use of special masters and independent neutrals in shepherding the electronic discovery process along for the plaintiff, the defendant and the court. An important role for these special masters and independents will be to help establish between the parties a binding protocol for electronic discovery. As this space evolves, these protocols may increasingly allow for computer-assisted review for the purpose of speeding up review time and reducing costs while also maintaining a statistically similar risk profile to that of a human review.

This is a newly emerging area which has caught the attention and imagination of both federal and state courts. It runs parallel with the recommendations of the Sedona Conference Cooperation Proclamation and is now being tested in the Seventh Circuit's pilot program.

An organization that has been very active in this space is the Academy of Court-Appointed Masters (ACAM), which is composed of judges, attorneys and a few non-attorney subject matter experts who are often called in to serve as special masters.

Using special masters for e-discovery is gaining traction because judges are fed up with back-and-forth squabbling between the parties over the issue of e-discovery and, just as relevant, defendants and plaintiffs are fed up with the significant costs of managing that process.

The idea is that the judge appoints a special master to sit as the independent third-party neutral working at the direction of the court to assist in crafting the protocol, language and direction of the e-discovery process, bringing together both sides, and perhaps even helping to choose the right technologies and consultants to use. This puts the special master in a position to make certain judgment calls on what is allowable and what is not allowable and also to play the role of referee between the two sides so that the e-discovery process is more streamlined, more effective.

This approach results in the management of the e-discovery process in a more transparent, rapid, low-cost manner while lowering the risks for all parties and providing clarity for the bench. Ultimately a major benefit to litigants will be that instead of getting bogged down in e-discovery disputes or worse, the actual issues of the dispute can be heard.The unfortunate reality of the last 10 years has been that there is a growing number of cases that simply settle or are unresolved as a result of the costs and complexity of the e-discoveryprocess.In addition, it is important for judges to know that the issues of discovery are being handled in a meaningful and fair way to both parties, yet without having the fireworks of the motion practice that can otherwise take place in an e-discovery world characterized by fights over keywords, privacy, format, production and many other issues.

Editor: Have machine-assisted translations reduced the human cost?

Laykin: Machine-assisted translations have evolved. They are now capable of relieving the massive workload of human translators for first-pass translations and early case assessment. Many cases today have an international component as the world has continued to globalize.

As a result, American litigation is more often than ever faced with the issue of foreign language management. This is one area where machine processing power and translation software can be very useful in cutting down the costs of review because you can run your foreign language data through automated translation processes that will at least give you a cursory understanding of the data. In some cases, depending on how the data is structured and what type of data it is, these tools can give you a very granular understanding of the data in English so that you can then review those documents, whereas in the past, having to translate large volumes of foreign language data, particularly Asian languages, has been very costly and time consuming.

Editor: Can machines also reduce the cost of early case assessment?

Laykin: Early case assessment technology will continue to be enhanced by tools that scan data for keywords or concepts in real time, allowing for proactive monitoring and segregation of potentially relevant documents behind the firewall. Ultimately one of the best techniques that one can use to reduce costs in both e-discovery and in data management is to get ahead of the data itself from a structural perspective.

If you are able to integrate records and data management programs and records retention policies within your organization, you will be ahead of the curve. Similarly if you are able to integrate some form of early case assessment technology or process within your enterprise or your environment you can then leverage that ability to collect data, segregate it, index it and define it based on various criteria. This will enable you to make decisions related to both an investigative path or an e-discovery response that are more informed as opposed to casting an overly wide net and collecting dozens or hundreds of custodians' data that are irrelevant to the matter. Thus, you may be able to more narrowly focus your effort and only collect data that truly is responsive, relevant and necessary so the net result is cost savings for the organization.

Editor: What is the role of machines and software in the early classification and categorization process?

Laykin: While computers may aid in initial first-pass review, they will also serve a very important role in the early document categorization/classification process, helping to establish document population topologies. This will allow the legal department and management to better understand their own data through active metrics.

I should note though that organizations can not solely rely on computers on their own.The process that the machine is going to follow is only as good as the policy and implementation that was designed by the records and information management organization. However, in simple terms, computers, and by extension e-discovery systems, review systems, early case assessment systems, and even forensic systems, are leveraged every day to categorize, catalog and segregate document populations.

In the last eight to twelve years, machines and software have been capable of ingesting larger and larger populations of various types of documents and segregating the documents based on file types, date ranges, custodian names and other criteria (even including concepts within that content). This technology, while it seems to be second nature today, was just a short while ago cutting edge and certainly out of reach for most litigants.

If we go out another eight to twelve years, we're going to see significant developments in the ability of software and hardware to recognize concepts, intent, ideas, words, language and sentence structure in new ways that will allow us to further refine the categorization and classification of document populations.

Editor: You mentioned an impressive number of ways in which computers can reduce reliance on human beings, but can they audit their results and refine their processes?

Laykin: An additional benefit of managing a review with the assistance of predictive coding or computer-aided analysis is that you can replicate and/or refine your results through the modification of system settings. You can audit the process or the results for the purpose of demonstrating your adherence to agreed-upon protocols and you can re-review your document sets while still expending less than one would through a traditional linear review.

I think that the fundamental point here is that if you engage in a machine-driven or what is called a predictive coding review, you are leveraging software and hardware that can either do a first pass or a discrete pass at some portion of the documents you want reviewed. By doing so, one of the benefits is that it will free up your resources to focus on other issues, and you are able to recalibrate the software and hardware to re-run your reviews in different ways to achieve more meaningful and relevant results.

This permits you to run tests or samplings of your document population to understand what the impact would be if, for instance, you take out a particular keyword or you change a variable in a Boolean search.This is far less expensive than rereviewing the entire document population with human eyes.

One of the risks is whether you can completely rely upon the results of a machine review. Even though IBM's Watson recently won at Jeopardy , you certainly can't depose a computer and ask it, "well, Mr. IBM machine, what criteria did you use to analyze this document population?" Through the testing process and the sampling process, however, you can gain a comfort level -assuming that the software and hardware is in fact up to the job.

Editor: With the growth in the amount of information do you see a time when total reliance on machines will be necessary?

Laykin: The need for computer-assisted review will continue to grow as the universe of electronic documents continues to mushroom.At a point in the not-too-distant future when neural networks and more robust processing power is applied to ever more sophisticated software, it may simply become impractical for a substantial initial document review to take place under the gaze of human eyes. The volume of data that needs to be reviewed particularly in a first-pass review or a relevancy review can be overwhelming to the point that it will cost far more to review the documents than the lawsuit itself is worth.

Unfortunately whathas happened all too often is that the costs and complexity of e-discovery is actually forcing settlement, and as a result justice is not being served.

I believe that as technology continues to evolve in this space, you will more than likely see a trend towards machine-driven review for much of the first-pass document reviews and eventually even for substantive reviews.It is a factor of cost, it is a factor or time, and it is a factor of what is reasonable. It very well may be unreasonable to ask a party to review massive numbers of documents at the rate charged for attorney review in a matter that does not meet a certain threshold in terms of monetary damages. I don't thinkwe are there yet, but it is clear that this is the direction we are moving.

Editor: You have sketched out a process in which the machine plays an increasingly important role. What is Duff & Phelps' role in this process now and into the future?

Laykin: Duff & Phelps is an independent global consulting firm that assists organizations in managing a wide variety of data challenges. These include building out the processes used by legal departments, such as matter management, time and billing and legal hold systems, as well as assisting with active litigation and disputes issues.However we also are very active in defining the scope and need within the enterprise as a whole for all of the various components of the electronic discovery process, including those that relate to the preservation, processing and analysis of electronic data and its review and production. As data continues to proliferate throughout the enterprise in an ever increasing number of formats, it is important for organizations to think through their e-discovery exposure in new ways. This includes breaking down traditional communications barriers such as between the CISO and Corporate Counsel's offices.

Duff & Phelps engages with its clients in a technology independent manner to define which technologies and services should be leveraged for use in a specific discrete matter or for the entire data ecosystem of a global organization.Different technologies do different things, and certain technologies are more appropriate for certain types of matters.

They have different cost structures. They have different bells and whistles. For example on the data processing side of the equation we may recommend and install a leading tool set such as "ZyLAB" for one client and "EnCase" for another. Both offer great solutions and each has a unique approach. For Data capture, it may be most appropriate to use "Access Data" for a large number of email servers, but on the other hand, we may recommend or use a highly specific tool such as "Perpetually" for capturing Facebook pages.

We assist our clients in making these decisions in an informed manner, whether it is for a standalone litigation where there is an e-discovery component or if it is for a corporate legal department looking to standardize on a set of processes, procedures and systems to use when faced with repeated e-discovery challenges. We also assist the same parties with the management of those tools so that our clients can ensure that they're getting what they're paying for. By example, we may assist in project management or oversight when using key industry tools such as "Iron Mountain" (formerly "Stratify"), "Fios" or "Xerox" (formerly "Amici") online data review tools. Among our team's mandates is to establish a deep working understanding of each of the industries' leading tools, their management and their cost structures so that our clients can ultimately make more informed decisions about their deployment.

Once you are in litigation, if you compare a machine-driven review of documents to ahuman-driven review, you will find errors in both types of reviews. Neither machine nor human is a perfect review platform. So, one of the roles that Duff & Phelps plays here is either as an advocate for one of the parties or as an independent neutral/special master to assist the enterprise, litigants or the court in structuring the correct balance between machine and human during each phase of the e-discovery lifecycle.

In many ways we are the certifying team that provides assurance and guidance to companies that are engaged in the process of selecting tools and systems that will reduce their exposure to the risks and costs of e-discovery. Companies are increasingly reliant on the processing power and capabilities of their software and hardware and the technology platform that they've integrated into their systems. This involves numerous tools and vendors. Duff & Phelps serves as the single point of contact to assist clients with the management and validation of the processes they are using throughout the whole lifecycle of the EDRM. Ultimately this provides peace of mind to management, a reduction of risks to the corporation and, importantly, a reduction of costs through the elimination of waste, overlap and poorly performing technologies or services.

Please email the interviewee at with questions about this interview.