Big Data Law And Hybrid Analytics In The Second Machine Age

Friday, January 24, 2014 - 16:39

In reading Erik Brynjolfsson’s and Andrew McAfee’s new book about the coming of The Second Machine Age (Work, Progress, and Prosperity in a Time of Brilliant Technologies), I couldn’t help but think about how the legal industry and e-discovery are already being positively affected by the Second Machine Age. There is no question that we are in the midst of a technological hurricane that will forever shape the litigation landscape and redefine e-discovery support and legal services in general.

The three forces identified by the authors driving the Second Machine Age are already here, notably the digitization of nearly everything, an amazing exponential growth in hardware and software power and the ability to use these new powers in a combinatorial way to create “a vast and unprecedented boost to mental power.”

These “New Age” tools, such as our hybrid analytics solutions, offer “exponential” productivity gains. In other words, each new process evolution returns measurable cost savings and significant project timeline reductions. The New Age workflows are designed to increasingly tame “big data” and are combinatorial, so they can be stitched together in a customized fabric or mosaic to solve heretofore-unsolvable problems that occur throughout the life cycle of a matter.

Hybrid analytics consists of various tools for concept searching, clustering, categorization, association, email threading and summarization. Think of it as an ever-expanding toolkit of algorithms and workflows designed to solve everyday litigation problems. We now deploy these multiple algorithms in a combinatorial fashion to achieve productivity boosts from the absolute minimum of human effort and cost. We are embracing the Second Machine Age.

Since the dawn of e-discovery, companies have been playing catch-up, reacting to a new world of ever-increasing ESI volumes, runaway e-discovery costs and general uncertainty in terms of case law and best practices. Yet, the pendulum is swinging the other way, and proactive measures taken in terms of efficiency, cost control and approach are yielding benefits to companies. Moreover, greater clarity from the judiciary on the rules of engagement has resulted in new strategic possibilities for in-house and outside counsel.

Today’s hybrid analytics tools are harbingers of things to come. In the legal industry, they offer the potential for unprecedented productivity gains and are redefining what’s possible for practitioners. Innovation enables productivity growth. Most innovation occurs when we recombine well-understood techniques. Hybrid analytics gives us the tools to combine and recombine ideas in new and different ways. Each development becomes a building block for future innovations.

Surprisingly, innovations are not spawned in isolation. They usually benefit from the involvement of numerous people with diverse backgrounds and perspectives. This is the single-most important thing to do – when innovating: Involve people with expertise far away from the problem at hand.

For example, it may be a good investment of time to catalog on a practice or department level the particular pain points of your specific practice and concerns of your clients. If there are bottlenecks that could be cleared, what are they? Doing this with the assistance of inside or outside experts may uncover a number of opportunities to innovate.

We are now in a period of rapid innovation, being driven by industry practitioners. You can see it in the reduction of paralegals and legal assistants across the industry. The sizes of document collections are getting larger, while the sizes of the review teams are getting smaller. The tools are better. The techniques are more effective. The senior members of the litigation team are now getting more involved in the earliest phases of the case in areas such as e-discovery and the use of hybrid analytics. This is driving innovation and enhancing productivity. Those organizations that formalize and catalog those processes will be the leaders of tomorrow.

Along with benefits, technological progress will bring economic disruption, leaving some people behind and workers without jobs. However, professionals in the legal industry can quickly and easily improve their skills to maintain healthy wage and job prospects.

Through hybrid analytics, more cognitive tasks are automated where machines can make better decisions than humans, without emotion and fatigue. You can either learn how to use these new capabilities to achieve great things for your practice and your clients or eventually be replaced by them. In this way, hybrid analytics is the best way to enhance your practice and redefine relationships with your clients.

Recharging The Value Proposition

As a consultant, there has never been such an exciting time to practice e-discovery. On most engagements using hybrid analytics, we are in fact achieving these expected exponential productivity gains. We can see where our advice is delivering better results by enhancing our clients’ capabilities and generating tangible and significant savings over past practices.

As importantly, we are increasingly being exposed to our client’s senior counsel from the outset of the engagement as they in turn are exposed to the latest technology, which has been designed to benefit from their input and guidance. In the past, with the adoption of repeatable best practices, many legal organizations developed a cookie-cutter approach to e-discovery, which would insulate senior counsel from many of the messy details.

Though senior counsel may have participated in the design of firm-wide best practices, the productivity tools were designed for lower level professionals - document reviewers, their supervisors, paralegals and litigation support professionals. The entire team now benefits from the process enhancements enabled by hybrid analytics, and the quality of the productions has never been better.

The new hybrid analytics technology requires the input of senior counsel at the early stages of a matter, but the return on investment has never been so great! Their involvement early on can literally save millions of dollars in document review costs. The tools will also provide senior counsel with new insights about the case and evidence at various stages of the matter, whether it is gauging the completeness of a collection or a production, or guiding witness preparation or privilege review.

Likewise, For The Legal Practitioner

With hybrid analytics, you can redefine your practice and make it more valuable to your clients. You are now able to leverage your know-how and your subject matter expertise in new and exciting ways. Recharge your long-standing relationships by offering better solutions. Excellence and value need not be mutually exclusive, and now is the time to redefine your value proposition! Master these new technologies.

FACE The FACTS: New Tools For A New Time (Only Great Is Good Enough!)

What can hybrid analytics do? At Evidence Exchange, we use nine distinct hybrid analytics techniques in conjunction with 15 workflows to help our clients FACE the FACTS.

FACE the FACTS is a checklist of features for Early Case Assessment (ECA) and project implementation using hybrid analytics. 

Phase 1 – FACE (Find out what’s going on.)

Find missing holes in production received (e.g., Brainspace, Content Analyst)

Associate people, places and things without human intervention (e.g., Palantir, NextLP)

Cluster and visually display concepts without human intervention (e.g., Brainspace)

Early case assessment of the factual landscape and potential risks (e. g., Equivio Themes, Brainspace)

Phase 2 – FACTS (Go forward with full-scale project implementation.)

Foreign language identification (e.g., Content Analyst, Equivio Analyze)

Assign priorities and define document review workflow to use (e.g., Equivio Zoom, Brainspace, Content Analyst)

Categorization tools for predictive coding (e.g., Equivio Relevance, Relativity Analytics)

Tracking e-mail threads for review productivity, consistency and precision (e.g., Equivio Zoom)

Summarize and graphically depict complex legal theories using ESI produced in matter (e.g., Palantir, NextLP)

(Note: Next to the problem solved by hybrid analytics is the name of sample products used by Evidence Exchange in its hybrid analytics workflow.)

These hybrid analytics tools help clients and trial counsel better understand the factual landscape the litigation will expose. The former needs such information to make an informed assessment of the potential risks (and rewards) of continued litigation. The latter requires it to develop strategies to optimally deal with the facts (good and bad) that will ultimately be presented to the judge or jury for decision in the case.

Going For Exponential Returns – Are Hybrid Analytics Mere Incremental Advancements 
Or Truly Worthy Of Second Machine Age Status?

Perhaps left alone, the technology behind hybrid analytics is impressive, but arguably not worthy of Second Machine Age status. After all, many of the underlying algorithms upon which these technologies are built have been around since the ’70s. They speed up things considerably and offer new insights, but by themselves, do they offer exponential returns? I don’t think so.

However, when coupled with the judiciary’s emerging views on proportionality and its acceptance of hybrid analytics, the combinatorial effect is worthy of Second Machine Age status. This trend will offer exponential cost-savings and timeline reductions for those in-house and outside counsel who deploy hybrid analytics on their document-intensive litigations. There is no reason not to explore how to take advantage of this recent development to reverse the trend of rising costs of e-discovery without introducing unnecessary risk.

Just two short years ago, the defensibility question surrounding hybrid analytics was unresolved. Today, the question seems to be settled, with various courts issuing opinions addressing or indicating approval of hybrid analytics, and none rejecting it. Recent decisions are providing comforting evidence that hybrid analytics technology is gaining widespread judicial acceptance as a credible tool available to counsel on appropriate cases.

The notion of proportionality in the context of predictive coding drives this point home. Hybrid analytics technology can tell you that to find 100 percent of the responsive documents, you must look at 80 percent (or a number defined by its algorithms, which is case specific and customizable) of the document collection. Keywords and date filters, for example, can drive the number of documents to be reviewed down, but even then the document review effort can still be enormous and expensive. Within the notion of proportionality in conjunction with hybrid analytics technology, we can determine that 80 percent of the responsive data exists within 20 percent of the document collection. Some courts are ruling that 80 percent satisfies the requirement of substantial compliance, particularly if to find the extra 20 percent would increase the producing party’s review costs by four- or five-fold.

Courts are even starting to form the view that, in the above example, if more than 80 percent is required, perhaps the receiving party should share some of the costs.

Obviously, not every situation will benefit from this equation and not every court will see the proportionality issues the same way, but since 50 percent of litigation costs have historically been in document review, driving down what is “good enough” may have a profound impact on the overall legal costs, particularly when hybrid analytics enables your team to use fewer people to get to “good enough.”

Other judicial protections concerning the inadvertent production of privileged documents and their subsequent clawback offer in-house and outside counsel additional considerations when weighing the pros and cons of hybrid analytics. Taken together with proportionality, for the right situation, hybrid analytics may offer in-house and outside counsel exponential savings over conventional methods.

Recently, we completed several hybrid analytics projects for in-house and outside counsel, where all agreed we saved millions of dollars in document review costs through a new workflow. We used a combination of competing hybrid analytics tools to forge a best-of-breed workflow. Granted, you must have the right team, the right talent, the right tools and most importantly, the right case to be successful.

Hybrid Analytics Case Study – Exponential Returns With Predictive Coding

In one case, there were 10 million documents that needed to be processed. After we pulled out all of the duplicates and the data that could be easily filtered out through date ranges and select custodians, we still had about four million documents that needed to be reviewed. We were able to use a predictive coding technology called Relevance, created by Equivio, and that tool, with its very tight workflow, allowed us to leverage the know-how of case experts (very senior members of the litigation team) who had a handle on what the key critical issues were.

The team reviewed only 8,000 documents (i.e., after assessment and not less than 40 training iterations, which is essentially the phase whereby the software “learns” what the objectives are) and the system yielded decisions on 3.8 million documents. Of that, 150,000 were likely responsive, and 250,000 were Excel spreadsheets. Excel spreadsheets and flat image files (unless OCR’d) still need to be linear reviewed, as they do not work well with hybrid analytics text extraction tools. On balance, the client reviewed 400,000 documents to yield 250,000 produced documents. That represents only one-tenth of the effort previously required without hybrid analytics.

Another Case Study – Using Predictive Coding On Less Responsive Collections

On another matter with a substantially “less rich” collection, using the same technology, our clients reviewed 4,600 documents and the system yielded decisions on 1.6 million documents. In addition, 350,000 documents were Excel spreadsheets and needed to be reviewed manually. On balance, the client reviewed under 500,000 documents to produce 177,000 documents. That represents one-quarter of the effort previously required without hybrid analytics.

Another Case Study – Using Hybrid Analytics To Associate People, Places And Things

Recently, we participated in an antitrust matter where hybrid analytics was used to point out certain types of relationships that existed in the data. The key takeaway here is that no human intervention was required to achieve these results. The ramifications of this are huge, because in the past, budgets just did not permit the kind of analyses that these tools now allow you to do for very little cost. It was impossible to get thousands and thousands of people to perform these kinds of analyses that can now be performed in an afternoon on inexpensive hardware and inexpensive software.

Final Case Study – Using Hybrid Analytics To Find Missing Holes In A Production

Not every case is perfect for predictive coding or associative analytics, but you have many, many other tools that are available today in our hybrid analytics world. These tools rely on little or no amount of human intervention. For a tiny investment of time and money, you can start to benefit greatly from hybrid analytics if you can provide the tools with the extracted text of the document collections or productions.

Take visual categorization, also known as concept visualization – where you can actually see the corpus of documents spread across an interactive dynamic word wheel. You are able, in real time, to delve into individual concepts that are linked to documents that can be filtered using date ranges, names of people and the like. Using these hybrid analytics tools, you have the ability to find missing content by using the tools to define what a perfect production should look like, then train the tools to “grade” the productions by identifying “holes” in the productions so you can quickly go back to the court and get the other side to meet their obligation.

Incremental Progress Or Exponential Results?

Some of our clients involved in intellectual property disputes certainly think we have achieved exponential results here! The big challenge is determining whether a production received is complete or not. In the past, this would take weeks of manual analysis, and the longer it takes to identify the missing holes, the less of an impact the discovery will have on the court. So here, being able to accomplish this analysis in one day upon receiving a production may change the dynamics of e-discovery and the outcome of the matter. These factors are what make our hybrid analytics productivity gain exponential in nature and worthy of Second Machine Age status.


Many have predicted New Ages prematurely, only to misjudge the pace of change or the impact of a new technology. I submit that hybrid analytics is different. It is letting us do things routinely that were beyond our comprehension and/or well within our time and budget considerations.

Increasingly, hybrid analytics does this with a minimum of human intervention. Hybrid analytics will make us smarter and our work will be more complete and less expensive. This trend, when combined with the recent case law involving proportionality and predictive coding, is offering newfound capabilities, cost savings and an unexpected productivity boost to fee-weary corporate executives.

Hybrid analytics technology is becoming a game changer, not only helping in the early stages of the matter, but throughout the life cycle of a matter. With the proper project management and control, these tools can and will positively impact budgets and cost control. In-house and outside counsel should spend significant time considering both whether the use of predictive coding technology is appropriate and necessary and, if so, the mechanics of its implementation.

Hybrid analytics is the new toolkit for the modern-day practitioner, and it is ever-growing. Although based on pretty old technology, hardware has finally caught up with the software and significant improvements will continue to the core algorithms. Every year new tools are emerging, and new workflows are being defined, some from our partners, though many are our own internal intellectual property and customizations. We expect that over the next decade, the 10 or 15 algorithms that are on the market today will turn into hundreds of algorithms, each being stitched together in a combinatorial way to create new productivity tools with tangible boosts to our “collective” mental power.

The tools are here. Productivity gains are not only possible, they are now expected. Innovation will continue to drive new productivity growth, and, as stated, most innovation occurs when we recombine well-understood techniques. Our hybrid analytics solution suite now gives us the tools to combine and recombine ideas in new and different ways. Each development we see today becomes a building block for tomorrow’s innovations.

Like all productivity tools, hybrid analytics will require an investment of human capital and an incubation period to refine practices and workflows, but, much akin to a North Dakota oil strike, they represent a new source of energy for legal practitioners. Learn how to harness it, because rest assured, your opponents shall.

Michael Prounis is Chief Executive Officer and Co-Founder of Evidence Exchange, a New York City-based electronic data discovery solutions provider.

Please email the author at with questions about this article.