In the era of Big Data, litigation and investigations involving the collection and review of a terabyte of data or more are no longer uncommon. Technology has made it too simple to create and retain virtually limitless amounts of electronically stored information (ESI). When faced with hundreds of thousands, or even millions, of documents in a collection, eyes-on review of every document in a collection is challenging in light of pressing deadlines; it is time consuming and – most importantly for today’s cost-conscious companies – it is expensive.
The whitepaper continues with a discussion of manual document review and provides compelling statistics in support of the argument that, even if traditional manual document review made financial sense, it is still a suboptimal choice for certain types of matters, as human review is prone to errors and inconsistencies.
Fortunately, technology is available to help resolve the problem that technology itself created. As a remedy to concerns about review quality and costs, technology-assisted review – sometimes called “predictive coding” – has emerged as a quicker, more accurate, more consistent and cost-effective alternative to linear review methods.
Technology-assisted review makes use of statistical modeling based on machine learning techniques to prioritize and rank documents by how likely they are to be responsive, without manually reviewing each and every document in the collection. Technology-assisted review considers decisions that senior reviewers make on a subset, or sample, of data and extrapolates those decisions from the sample to a larger data set. The process is typically iterative in that it applies statistical sampling and quality control techniques to refine and improve its decision-making as it progresses. The relevance assessments that the system makes are probability scores, from 0 to 100 percent, that indicate how likely a document is to be responsive. This specific methodology has been applied to other industries as well, including business intelligence tools such as patent searches and online ad optimization.
One example of this approach is found in Xerox’s CategoriX technology, an automated document classification tool that ranks documents for prioritized document review, enhanced quality control processes and expedited first-pass responsiveness review. It uses a text-based modeling technique to predict relevance based on samples of human-assessed documents.
The CategoriX process involves two iterative phases: training and categorization. First, Xerox works with the key stakeholders in the review to determine a seed set, which always includes a sample randomly drawn from the full review population, and may be further supplemented by keyword, Boolean and concept searching. Senior attorneys then review this set for responsiveness, and their assessments on each document are used to train CategoriX. From there, the software makes relevance predictions for the documents in the rest of the collection based on the manual assessments made by attorneys in the seed set.
Next, the review team manually reviews a set of documents for quality control (QC) to ensure the software has properly categorized them, which provides an opportunity to continue calibrating the system by recoding and retraining the software as necessary. As part of the QC process, CategoriX detects similar documents that have conflicting manual coding and routes them back to the attorneys to confirm or alter their assessments on those documents.
Ultimately, the system provides a set of documents organized by priority. Documents with a higher priority can be assigned for review first, allowing initial productions to be made more quickly and for potentially key documents to surface earlier in the process. Documents in the lower tiers and classified as not likely to be responsive can be sent to less expensive review teams or contract attorneys to review, or in many cases can defensibly be set aside altogether.
Though remarkably powerful, technology-assisted review does have some drawbacks in litigation, generally pertaining to its complexity; the perception that it is a “black box” solution with push-button technology; its newness; and the consequent concerns over defensibility. The whitepaper discusses two key issues to be addressed – transparency and precedent – and includes examples from recent case law that reflect the increasing judicial acceptance of technology-assisted review.
Parties can take a number of steps to improve the defensibility of their technology-assisted review processes. The whitepaper discusses six steps, which are summarized here:
Technology-assisted review can be used in a variety of ways to meet the unique requirements of specific matters.
Early Data Assessment. Technology-assisted review can be used to analyze data collections early on in a matter to shape budget projections and case timelines, contribute to defensible data reduction strategies and predict downstream review efforts.
Culling. Prior to review, technology-assisted review can be used for effective culling of likely non-responsive materials to ensure the smallest but richest possible quantity of documents is reviewed.
Review Prioritization. Technology-assisted review can segment documents into batches by how likely they are to be responsive for more efficient workflow. For example, those with the highest probability scores can be assigned to senior attorneys, while less experienced reviewers can review documents with lower probability scores.
Enhanced Quality Control. Once a first-level review has been completed, reviewers can compare assessments made by humans and those made by the machine to easily identify discrepancies.
Responsiveness Review. Technology-assisted review also can be used for comprehensive first-pass responsiveness review, allowing legal teams to review a smaller subset of the total document collection manually.
When properly leveraged, technology-assisted review offers parties the ability to reduce the total costs of e-discovery and enhance flexibility for a streamlined workflow. Beyond those benefits, it helps to maximize the contributions of the attorneys most knowledgeable in the case and provide the review team with early insights into the data set. As the technology becomes better understood among review managers, it can even provide opportunities to improve defensibility relative to manual reviews. So long as parties continue to use this technology in conjunction with effective and defensible practices, technology-assisted review may soon enough become the new “gold standard” of modern document review.
Stuart LaRosa is Senior Search Consultant at Xerox Litigation Services, where he advises corporate and law firm clients on implementing strategic search, advanced analytics and technology-assisted review strategies to match their specific document review needs. Mr. LaRosa is a linguist by training and has over eight years’ experience applying advanced search technologies and strategies to help clients address e-discovery challenges.