Total Recall: Leveraging Strategic Legal Technology For Early Case Assessment And Cost Control In eDiscovery

Saturday, November 1, 2008 - 01:00
Fulbright & Jaworski, L.L.P.
Laurie Weiss

To access a recording of this webinar, please email

Weiss: Our program title is related to the concept of "Eidetic Recall," which is a term for extraordinarily vivid and detailed recall. We'd all like to have eidetic recall during early case assessment and document review. With the help of emerging technologies, methodologies and processes, we're headed in that direction.

Let's start with a couple of industry trends. As corporate counsel, you are under increasing pressure to reduce discovery costs wherever you can. At least in part as a result of that, we see a mandate for information governance beginning to grow. We think that's being led, to some extent, by legal departments that are recognizing that without information governance and information management they have no reasonable hope of reducing discovery costs because corporate data volumes continue to increase at dramatic rates. Today, we'll focus on harnessing and leveraging technology-assisted search and retrieval for early case assessment and review strategies. We are gaining traction on becoming more defensible.

Carpenter: Lawyers have found themselves becoming familiar with terms such as "concept searching" and "predictive tagging," terms we'll touch on today. So we thought it would be helpful to marry the various industry trends on the technology side with these trends on the business side, and then we're going to marry them as well with the case law. Finally, we'll show you a case study on how these things are being used.

The industry of information retrieval has been largely academic until the last decade or so. These days it is anything but. Another way that people refer to it is "sense-making." How do we make sense of data? Information retrieval concepts have emerged, like keyword search, Boolean search, concept search, linguistic techniques, and statistical search technologies. Something else that we're recognizing in this industry is that the "manual approach" to retrieval and review of documents is not only impractical with the volumes of data - it's highly inaccurate, and incredibly expensive. There's just too much data, and too little time to get through it all; we need to bring technology to bear.

The industry - and really when we're talking about the industry we're talking about general counsel directing outside counsel and their technology vendors as well as, increasingly, the courts are really pushing for less expensive and more accurate search and retrieval methodologies and practices and technologies. Everyone is pushing to reduce cost while reducing risk, which is a challenging but not an impossible task.

What we are finding is that automated or technology-assisted approaches are becoming mandatory. As regulatory oversight and litigation increases, there's no other way to keep up with the data volumes and risks. Using keyword and Boolean searches alone without more sophisticated techniques to supplement them is increasingly problematic. The recent case law, including Victor Stanley, Inc. v. Creative Pipe, Inc. , 2008 WL 2221841 (D.Md. May 29, 2008) (Grimm, J.) as well as the McAfee case (which we'll touch on later), reveals that those types of searches are no longer enough. The standard of what is considered reasonable by the courts is changing at a very rapid rate. Adding various techniques, including concept-search and auto-categorization-based techniques to supplement the practice of law - taking the high volume input from lawyers and using that as a guidepost for this technology - is a far more powerful approach to reducing cost while reducing risk.

Weiss: Craig, what are you seeing in terms of the fusion of these various approaches?

Carpenter: We're seeing the opposite of what many people view as using technology to the exclusion of lawyers. Lawyers can deal with a situation and any risk much more effectively when they bring technology to bear. When lawyers are able to plug their high-value input into a system that can then extrapolate across huge volumes of data, we're finding that the lawyers are far more effective at doing their jobs and, ironically enough, both in-house and outside counsel become far more important.

From a textbook perspective, a good definition of concept search is "finding relevant documents beyond verbatim search strings, irrespective of key words." Concept search is increasingly important for a couple of different reasons. First, words can have different meanings. If you go by keywords alone, you will not be able to understand the nuance of language. Also, regional differences are important.

Another interesting technology that's gaining steam is "predictive tagging." This is essentially an automated analysis of relationships (or "aboutness") based on attorney judgment. An attorney finds a few relevant documents, plugs them into a piece of software, and gives the command: "Find more, or all, like this." Now that you've found your documents that have "aboutness," you can hit another button in various technologies that will perform "retraining." What that means is that it will take your highly substantive and highly informed opinion and extrapolate across an entire universe of documents, and so what you end up with is documents that essentially have not been reviewed by anyone coming to you precoded and preorganized based on topic and priority.

The next two terms, "precision" and "recall," seem a bit arcane. When we talk about "precision" we mean the percent of relevant documents returned as part of a search. "Recall" is a complementary metric to precision. Recall really measures how many relevant documents were pulled from the universe of documents in response to a query. In a practical application, recall is a measure of false negatives. Obviously false positives and negatives carry risk in their own right, and as evident in the McAfee case, false negatives can keep general counsel up at night.

As to the judicial perspective on automated search and retrieval, you may be asking yourself, what about the level of scrutiny? What are the judges going to say about this; how are we going to defend this process? We start with Federal Rule of Civil Procedure 1: "a just, speedy and inexpensive resolution of every action." The reality is that where we are today, dealing with such large data volumes, it's impossible to have "a just, speedy and inexpensive resolution of every action," without some technology to assist in the search and retrieval and document review phases of electronic discovery.

The earliest cases that talk about search terms and electronic searches arise in 2007. Perhaps the most interesting is Disability Rights Council of Greater Washington, et al. v. Washington Metropolitan Transit Authority , 242 FRD 139 (D.D.C. 2007) (Facciola, J.). Judge Facciola is a proactive and vocal judge who speaks frequently on electronic discovery issues. In that 2007 case, he proposed the use of concept searching as a supplement to keyword searches.

Fast forward to 2008: We're beginning to see many more cases that deal with electronic discovery in general - specifically; we've had a trilogy of cases in 2008. Judge Facciola, heard U.S. v. O'Keefe , 2008 WL 449729 (D.D.C. Feb. 18, 2008) and Equity Analytics v. Lundin, 248 F.R.D. 331, 333 (D.D.C. 2008) His opinions are frequently being cited to stand, at least in his jurisdiction or in his court, for the proposition that we may need expert help in defining what our methodology is for searching large universes of documents. And he subjects the scrutiny to the level of Federal Rule of Evidence 702.

Also, in 2008, Judge Grimm, heard the Victor Stanley case. Unfortunately, in that case, the holding was that the defendants did waive privilege over 165 electronic documents inadvertently produced during discovery. There were a number of reasons cited by Judge Grimm, including flaws in the search and information retrieval methodology. Judge Grimm, in his opinion, argued that attorneys working on a case can become experts in applying appropriate methodology in the context of a thoughtful process. So in terms of judicial scrutiny, the level of required expertise rests with the legal team and the outside expertise they use. The key to defense of the methodology is a well-thought out process, documentation of that process, effective sampling against the universe of documents to be sure you have not left behind relevant documents, and preparing an effective explanation of the rationale, the decision process and the steps taken.

Some of you may be aware of the McAfee case. Essentially the general counsel was indicted for stock-option backdating - the type of situation we've seen in the past with Brocade and others. A couple of well-known firms got in hot water for following the same current practices and processes deployed by the vast majority of firms. Several documents were miscoded by contract attorneys. Most firms use contract attorneys in one form or another; the process that was followed by both of these firms. The procedures followed by the firms in that case were very typical, and the vast majority of courts would not have faulted them. However, the judge was upset because the documents that were miscoded as not being relevant were not turned over to the parties until 10 hours before the trial began. When you combine the McAfee case with Victor Stanley , what you see is that this industry can't sit still - keyword search and other types of processes and workflow technologies simply may not be good enough. And, the implications of new Rule 502 must be taken into consideration.

Weiss: On Friday, September 19, President Bush signed into law new Federal Rule of Evidence 502. Rule 502 governs privilege waiver following an inadvertent disclosure of matters protected by the attorney-client or work-product privilege. It provides that inadvertent disclosure of attorney-client or work-product information in a federal proceeding will not operate as a waiver, provided that the disclosure was inadvertent, the holder of the privilege took reasonable steps to prevent the disclosure and then promptly took reasonable steps to rectify the error. It makes clawback agreements binding on nonsignatories. So that if you have an inadvertent waiver in a federal case, for example, the clawback agreement incorporated into the court order will be binding on nonsignatories. Therefore, the disclosure will not operate as a waiver in other federal cases or in other state cases.

Rule 502 does not apply in the international environment and is not applicable to selective waiver in the context of a government investigation. But it does help; it should give us some comfort about a concern that keeps us awake at night. But note its emphasis on "reasonable steps." Because we rely more and more on technology to assist us in helping to identify the documents that are subject to privilege, "reasonableness" is, as we have seen in the cases mentioned by Craig, defined by the use of what a judge or jury deems to be state of the art technology.

Florinda will now describe some practical applications of the Axcelerate eDiscovery solution that Craig described.

Baldridge: About nine months ago, Fulbright evaluated and acquired Recommind's Axcelerate eDiscovery product. This technology has enabled us to develop a strategic early case assessment methodology. The strategy we're deploying includes leveraging the Axcelerate eDiscovery product by applying it to a sampling consisting of the documents of key custodians early on in the case. Attorneys can use the results of this early analysis of key custodians to make certain decisions regarding case strategy.

Specifically, this sampling of key custodians identifies key issues, people, documents, company nomenclature, and other strategic information to assess and evaluate a case strategy for moving forward. This sampling can inform significant case issues, collection scope, estimated data volumes, and enable development of estimated eDiscovery costs early on in the case. Finally, this early case assessment process can assist attorneys in effective preparation for meet and confer conferences.

Carpenter: A term that we've started to use quite a bit is "Early Risk Assessment." Today, that often can completely dictate what happens with the case. We're seeing more and more of our clients and their peers looking for ways to identify ediscovery cost very early on.

Weiss: One of the most frustrating things in litigating a case is that, before you know what the documents say, you have to figure out what the case is about, and determine its value compared to the potential cost of discovery. Is this a case you should settle? Or, is this case worth the cost of discovery? Or, is this a case you might have to take to trial? To make these determinations, you need to develop an estimate of discovery costs and know the relative strengths and weaknesses of the case and as early as you possibly can. This is now possible because, starting with a big universe of documents, you can, through sampling and early case assessment using new sophisticated technology, find key documents and learn what the case is about and in the process get a handle on discovery costs. Early on, a window of time may open up where there is an opportunity to settle. Why miss that opportunity because you don't have the tools to see the true parameters of the amount and value of data in the case and therefore cannot do a reliable case assessment.

Baldridge: The beauty of the approach that we apply is that you don't need an army of reviewers. You can conduct a very strategic and timely review with your case team and get through large amounts of data.

Carpenter: The idea of concept search is a sexy idea we've all heard about, but many people don't know how they would apply it. Keyword and phrase approaches to searching don't account for the nuances of language. Concept search allows you to normalize language - for instance, using company-specific nomenclature that attorneys often don't know until they review a sample of the documents.

The Axcelerate eDiscovery product automatically populates and makes visible key terms and phrases as well as key people of interest based on what's in a particular collection. Applying "clustering and foldering," it automatically puts documents into various buckets and can even create the buckets themselves in the form of various concept groups not married to key words but to the "aboutness" of a particular document or set of documents.

Baldridge: This sophisticated technology can even perform probability ranking so that attorneys can prioritize their review of the folders and clusters.

Carpenter: That's a very good point. It's certainly very helpful to have documents grouped according to topic or subject, but within those groups there are huge gradations of priority. A system that can organize documents from most important or probative down to the least makes the review far more efficient and more likely to find documents that may be the "smoking guns." Conversely, the less important documents in a subject can then go to far less expensive contract reviewers.

Axcelerate eDiscovery also allows attorneys to shape their legal strategies and judgments early in the case and hone them as they go, as they come across additional documentary evidence. Technology moves analysis and informed decision-making up to the very front.

Baldridge: Think about what happened recently in McAfee and the dire consequences of miscoding certain responsive documents as nonresponsive. This highlights an increased need for quality control. One really important aspect of this technology is that it improves quality control, not only of documents to be produced, but also of those designated as non-responsive.

Weiss: We are going to conclude with an actual case study in which we implemented the Axcelerate eDiscovery solution that Florinda has been describing. We developed this case study because we thought it demonstrated some of the power of the technology-assisted review. Remember that not every case is going to have these same metrics.

Our case study involved a medium-sized commercial case, so there were fairly large volumes of documents on both sides. Therefore, cost was an extremely important consideration. We had way too much data, and we knew it. The first thing we did was negotiate for prefiltering during a meet-and-confer, and we reached an agreement on some relevant search terms. Were we wrong about those search terms? Yes, probably. We conducted key word and Boolean searches, so we did leave some things behind in the early prefiltering process. We were able to substantially reduce the universe to about 100,000 documents to be dropped into a review platform. Even so, that was too many documents to review.

We then moved to develop a review strategy using Axcelerate. The first step was to develop relevant search terms. We then asked Axcelerate to cluster and folder the 100,000 documents statistically probable to be relevant based on those terms. This was followed by a 15-day review by three full-time reviewers of the documents in the folders of those with the highest probability of relevance. As the reviewers worked their way through that universe, we retrained every day. This brought us back folders of those even more likely to contain relevant documents. Through concept clustering, retraining and predictive tagging, we identified 30 percent of the 100,000 documents as those most likely to be relevant. Of that 30 percent, 56 percent were determined to be relevant. Where you have used some kind of search term filters at the beginning, typical relevance rates are about 10-15 percent, sometimes less.

Because the attorneys actually had to put their eyes on fewer documents, we ended up having to use less attorney hours (300 as compared with 2000) than would otherwise have been required. The final 27 percent relevance yield attained was much greater than usual. We attribute this to the better quality of the review as a result of using leading edge technology.

Not only did the technology employed make it possible to do a more thorough review job in a shorter time and at a lower cost, but we had the time and budget dollars to do random sampling of documents coded as nonresponsive and documents left behind. We were able to sample across the date range, across the custodians, across the emails - samples of every type of document - and we were able to reach a very high confidence level that we would not find any other relevant documents. Judge Grimm in the Victor Stanley case criticized the failure to use a sampling process of the documents left behind to test for privilege. One way we've been able to leverage Axcelerate eDiscovery is to use the concepts of foldering and clustering in the sampling process to reach a much higher level of confidence and a more defensible process. Had there been that kind of process in the Victor Stanley case, they might have avoided the inadvertent waiver.

Our case study is a good illustration of the power of leveraging technology. It's through this leveraging that we can be more strategic and faster, and we can improve the accuracy and consistency of the relevance and privilege reviews, knowing we are working toward meeting corporate counsel objectives of risk management and cost control.

Baldridge: In cases where the opponent's document production is quite voluminous, Axcelerate eDiscovery can be used to sift through a data dump and get to the most highly relevant documents early.

Please email the participants at,, and with questions about the above.