How To Manage The Costs Of Big Data In E-Discovery

Saturday, June 23, 2012 - 12:22

Big data plagues litigants by escalating the already expensive process of e-discovery, requiring an even bigger solution. In sifting through voluminous data to locate information responsive to discovery requests, businesses spend hundreds of thousands – and sometimes millions – of dollars to isolate the relevant electronically stored information (ESI). As these costs continue to skyrocket, e-discovery companies are responding by developing new tools and best practices to help corporations manage the amount of data involved in discovery. For example, technology-assisted review methods such as predictive tagging offer a critical answer for decreasing expenses while optimizing the accuracy of review.

Recently, the RAND Corporation Institute for Civil Justice (ICJ) completed a study entitled “Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery” in which it addressed “one of the most persistent challenges of conducting litigation in the era of digital information: the costs of complying with discovery requests, particularly the costs of review.” The ICJ found that the cost of review is roughly 73 cents of every dollar spent on ESI production, while the collection and processing phases represent about 8 cents and 19 cents, respectively. Litigants committed to reducing these e-discovery costs and to executing searches that return sufficient (but not over-inclusive) data should consider the following best practices to tailor e-discovery projects appropriately, to avoid discovery disputes and to realize savings in time, resources and cost.

Build an internal team and communicate clearly. A team of individuals that can successfully plan and proceed with a search for ESI typically includes an e-discovery project manager, in-house and outside counsel, a representative from the IT and/or records management departments, and a technician or consultant skilled in building keyword search strings. The project manager should lead the communication among team members and document search and culling strategies throughout the e-discovery project.

Cooperate with the opposing party. Cooperation is crucial; work with the opposing party to avoid claims that the data production is insufficient. For example, in EEOC v. McCormick & Schmick’s Seafood Restaurants, Inc., No. WMN-08-0984, 2012 U.S. Dist. LEXIS 13134 (D. Md. Feb. 3, 2012), an employment discrimination matter, the Equal Employment Opportunity Commission (EEOC) filed a motion to compel documents and interrogatory answers from the defendant after the parties disagreed about the proper scope of a request asking for e-mails relating to “applicants, complaints of racial discrimination, and server section assignments.” The defendants objected to the EEOC’s discovery request for e-mails, claiming it could involve “potentially hundreds of thousands of pages of e-mail correspondence.” The court noted that the “common practice governing the discovery of electronically stored information requires the use of search terms to make an extraordinarily burdensome request comply with the tenets of Fed. R. Civ. P. 26(b)(2)(c). If the producing party generates the search terms on its own, the inevitable result will be complaints that the search terms were inadequate.” Therefore, the court ordered the EEOC to meet and confer with the defendants to develop search terms to narrow the request for e-mail.

Likewise, in Degeer v. Gillis, No. 09 C 6974, 2010 U.S. Dist. LEXIS 129745 (N.D. Ill. Dec. 8, 2010), the court described “the importance of candid, meaningful discussion of ESI at the outset of the case, including discovery of ESI from non-parties.” In this dispute concerning an asset purchase agreement, the defendants served a subpoena for documents on a nonparty that previously employed the parties. The nonparty did not dispute the relevance of the requested documents but claimed the request was burdensome if more precise search terms could not be specified. Finding that the nonparty “was not entitled to unilaterally” make assumptions about the defendants’ requests, the court ordered that the parties meet and confer on terms, custodians and dates.

Identify the appropriate parameters. Avoid an avalanche of electronic data  –  and expenses  – by limiting your search to applicable data sources and time frames. Determine whether your search can exclude certain data custodians, folders, directories, file types or inactive files. If you are seeking specific files, consider searching by file extension. If possible, limit your search to date ranges that are relevant to the case.

Focus on the proper keywords. Determining the right search terms will lead you more quickly to relevant and ultimately responsive electronic data. Work with your team to devise names, terms and phrases  –  including acronyms, abbreviations and similar words  –  that may be connected to the case. In cases where the team is not familiar with the contents of the data to be searched, it may be useful to review a subset of the data to learn which keywords to include in the search.  Similarly, concept search and clustering tools can be applied to data sets to help a case team become familiar with the contents of a data set more quickly. Finally, to avoid a search result set that contains a large number of false positives, consider filtering data by e-mail sender domain to exclude known nonresponsive documents that could end up as false search term hits.

Careful choice in both parameters and keywords can help parties find the right information as well as collaborate with opposing parties. For instance, in Cannata v. Wyndham Worldwide Corp., No. 2:10-cv-00068-PMP–VCF, 2012 U.S. Dist. LEXIS 20625 (D. Nev. Feb. 17, 2012), the court encouraged the parties to curtail the cost of e-discovery by imposing cost-shifting measures tailored to requests for additional custodians and search terms in the interest of limiting expenses. The plaintiffs had asked the court to increase the scope of e-discovery from an initial court order that limited the parties to 20 custodians and 50 search terms. The plaintiffs wanted to deploy a list of 100 search terms across the e-mail accounts of 50 custodians. Permitting the parties to use an iterative process to refine search terms and the list of custodians and specifying the types of search terms that were appropriate, the court ultimately ordered the parties to reach a final list of 20 search terms and 20 e-mail accounts or data sites. The court also imposed a disincentive to the plaintiffs to continue expanding the scope of discovery: the plaintiffs would not be required to reimburse the defendants for the costs of e-discovery so long as the final set of search terms and sites searched combined did not exceed 40, but for every extra search term or site, the court required the plaintiffs to reimburse the defendants for five percent of their e-discovery expenses.

Use the best-fitting technology. For projects that involve voluminous amounts of ESI, technology is evolving to meet clients’ search needs: newer review technologies may be more cost effective than traditional search methods. For example, predictive tagging uses software that “learns” to identify the relevant ESI from a reviewing attorney, and it does so quickly. As the ICJ characterized it, predictive tagging allows computers to do the “heavy lifting” in document review, reducing the set of documents that must be reviewed by attorneys. Driven by the attorneys “most closely involved in the case,” predictive tagging “automatically assign[s] a rating (or proximity score) to each document to reflect how close it is to the concepts and terms found in examples of documents attorneys have already determined to be relevant, responsive, or privileged.” Early studies show that predictive tagging is at least as consistent as  –  and possibly more consistent than  –  review by humans, and that its cost savings can be substantial.

Judges are also beginning to recognize the benefits of this advanced review technology. In April, a much-anticipated opinion favored technology-assisted review, as U.S. District Judge Andrew Carter approved Magistrate Judge Andrew Peck’s opinion in Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC) (AJP), 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Apr. 26, 2012). In Da Silva Moore, the parties had agreed to use the technology but disagreed over its application. In approving Judge Peck’s order, Judge Carter wrote, “There simply is no review tool that guarantees perfection.” He remarked that “even if all parties here were willing to entertain the notion of manually reviewing the documents, such review is prone to human error and marred with inconsistencies from the various attorneys’ determination of whether a document is responsive.” Because the “ESI protocol contains standards for measuring the reliability of the process and the protocol builds in levels of participation by plaintiffs,” the plaintiffs will have  an opportunity to shape the process and thus ensure it meets their needs. If not, the judge left the door open for further disputes from the plaintiffs, calling their current set of objections “premature” and “speculative.”

Similarly, in Global Aerospace v. Landow Aviation No., CL 61040 (Va. Cir. Ct. Apr. 23, 2012), on a defense motion requesting either an order of technology-assisted review or plaintiff-paid traditional review, the court approved the use of technology-assisted review. In this case, the parties could not agree on a review process for the large volume of documents in the defendants’ collection. The defendants argued that a single pass of manual review over their documents would cost $2 million and locate only 60 percent of the responsive documents; predictive tagging could locate up to 75 percent of potentially relevant documents “at a fraction of the cost and in a fraction of the time of linear review and keyword searching.” The plaintiffs protested that the technology was not as effective as human review. They also described the technology as “a radical departure from the standard practice of human review.” The judge approved the technology over the plaintiffs’ objection, while allowing the plaintiffs to question later “the completeness of the contents of the production or the ongoing use of predictive coding.”

By following these best practices and choosing the right e-discovery tools and technology, businesses can pare down the amount of ESI involved in discovery and therefore their review and processing costs as well.


Kelli Clark is the Vice President of Solutions and Services at Applied Discovery. In this role, Ms. Clark oversees the design of solutions and services to clients in a value-added, consultative manner and supervise delivery by the company’s industry-leading project managers, business analysts, and solution managers.  

Ms. Clark is an industry veteran who has overseen hundreds of large-scale e-discovery projects and managed client-services, consulting, and technical teams. She has spoken on e-discovery topics at CLE courses and seminars throughout the United States and Europe, and she has been called upon to provide expert testimony regarding e-discovery procedures and best practices. Ms. Clark is also a guest lecturer in the E-Discovery Management Program at the University of Washington.


Please email the author at with questions about this article.