XLS Leverages Man And Machine For Defensible And Accurate E-Discovery

Monday, August 1, 2011 - 01:00

The Editor interviews Amanda Jones, Senior Classification Analyst at Xerox Litigation Services' New York office.

Editor: Tell us what's new with Xerox Litigation Services.

Jones: We've experienced tremendous growth over the past few years; for example, our data volumes doubled last year over the previous year, and we're on track to double volumes again this year. To meet our corporate and law firm clients' needs, we've continued to invest heavily in R&D to introduce new technologies and services, and we have significantly expanded our client services team to more effectively address clients' requirements. We also recently opened up full-service e-discovery facilities in Europe to address clients' cross-border challenges, as well as the needs of European-based law firms and corporations.

Editor: Amanda, I understand that you are a linguist by training. How did you get involved with Xerox Litigation Services?

Jones: About six years ago I got involved in the e-discovery industry using linguistic techniques to help solve complex litigation-related information retrieval problems. A colleague introduced me to XLS, and I saw an opportunity to do something very exciting. XLS is addressing clients' e-discovery challenges - especially document review - in novel ways that create efficiencies and cost savings not seen elsewhere in the market today. As senior classification analyst at XLS, my role is to oversee XLS's automated document classification technology, CategoriX, and our search consulting services. I assist our clients in designing and validating defensible e-discovery strategies to maximize the efficacy of their document review projects.

Editor: How does XLS offer unique value to its clients?

Jones: XLS operates in a very competitive market, but we've differentiated ourselves with our technological leadership and innovation, which have been key drivers in helping clients achieve significant cost savings, efficiencies and defensibility throughout the e-discovery process.

XLS has a strong track record in e-discovery. Since 2001, our review platform, OmniX, along with our advanced processing, has helped clients move through the review process as efficiently and cost-effectively as possible. We employ more than 40 developers who spend their waking hours developing new client-driven tools at no cost, for the benefit of all of our clients. For example, last year alone we introduced over 100 new client-driven features, such as e-mail analytics, litigation hold, automatic data detection, and search analytics, to name a few, that have helped clients increase productivity and drive down cost. Our developers also work on the system back-end to ensure 99.5 percent-plus uptime, speed and limitless scalability, and state-of-the-art security.

We've been able to marry our culture of innovation with that of Xerox, which has a long legacy of investment - in the order of hundreds of million of dollars each year - in developing new technologies, many of which have direct applicability to the field of e-discovery. For example, research scientists at Xerox PARC and Grenoble developed CategoriX and our foreign language identification capabilities.

So our clients are ahead of the curve when it comes to e-discovery, and it's because of the innovation we're able to leverage.

Editor: So what is CategoriX?

Jones: CategoriX is XLS's proprietary automated document classification technology. Researched and developed at Xerox Research Centre Europe, CategoriX is our approach to computer-assisted document review. It is designed to significantly improve attorneys' abilities to search and filter information in large document collections. CategoriX leverages machine learning techniques and statistical modeling to rank documents automatically according to how likely they are to be relevant.

The CategoriX review process consists of two phases: a training phase and a categorization phase. During the training phase, models are built based on samples of documents that have been coded by the attorneys or subject matter experts most knowledgeable about the case. Then, during the categorization phase, these newly developed models scan the unreviewed population to assign each document a probability score indicating how likely the document is to be relevant to the topic at hand.

By engaging the case team early in the process, reviewers can access important information sooner and spend significantly less time and money reviewing non-relevant information down the road.

Editor: How does your approach to automated document classification differ from other approaches, such as predictive coding?

Jones: CategoriX actually belongs in the same class of automated document classification technologies as the predictive coding software offered by Recommind and other e-discovery vendors, but we have a different process and approach than companies that just sell software.

Our process engages attorneys knowledgeable about the case to "train" the software by reviewing samples from the document collections - as do others - and we also utilize an iterative process to "learn" from knowledge gained over the course of this training phase.

The advantage XLS offers, though, is the technical expertise and hands-on guidance that will ensure that the process and results stand up in court. This entails utilizing technical search and statistics experts to drive the technology and design sound validation protocols.

Even our largest, most technically savvy clients have few technical experts resident in house to drive an automated document classification system, and they shouldn't have to - it's not their core business. By bringing technical expertise to the table, including linguistic and statistical expertise, we free up attorneys and expert reviewers to focus on case strategy.

Here's where specific statistics and linguistics expertise comes into play and why it matters to quality and defensibility. I'll use an example to illustrate. Let's say there are one million documents in the collection. XLS will analyze the document collection characteristics and then give the expert reviewers a statistically valid sample to manually review for responsiveness, say around 10,000 documents.

Based on the results of the sample review, CategoriX "learns" how to score each document based on how likely it is to be relevant. In an adaptive and iterative process, our experts feed additional statistically valid samples to the experts to review, and CategoriX progressively improves the accuracy and consistency of its relevance scoring based on new information. We monitor CategoriX progress, look for any trends in errors, and identify ways of bolstering CategoriX performance using more robust statistical or linguistic models. We'll continue this process until optimal results are achieved. Then, the algorithm is applied to the entire document collection.

Throughout the process, we create an audit trail with standardized reporting and documentation of all key inputs, decisions and results, and we measure the quality of results using statistically valid measurement protocols at every stage of the process, for ongoing quality control and final quality assurance.

The CategoriX process is designed not only to maximize the quality, efficiency and cost-effectiveness of the client's review, but also to incorporate the best practices called for by the courts.

Editor: You mentioned CategoriX results in higher quality than manual review. What does this mean in terms of cost to the client?

Jones: Our experience is that when clients think about quality, they think in terms of "how much more will it cost me?" In reality, automated document classification technologies like CategoriX, when applied with right set of expertise, processes and measurement, can significantly cut the costs of document review.

Here's one example of the cost savings one of our clients, a Global 50 company, achieved. The company was subpoenaed by a regulatory agency, which required it to review two million documents in under three months. This would have involved hiring a sizeable team of attorneys to review the documents, at significant cost. They decided to utilize CategoriX automated review as a cost-effective alternative. As a result, the legal team completed the manual review and quality control in just 54 days with a small team of just 10 reviewers and saved 67 percent in costs they would otherwise have incurred. In addition to significant cost savings, CategoriX achieved very high quality - 92.3 percent precision and 99.8 percent recall, meaning that virtually all and only relevant documents were produced for the matter.

Editor: For which types of clients and cases does automated document classification provide the greatest benefits?

Jones: There isn't a single type of client or matter for CategoriX. Our clients use CategoriX to meet a wide range of goals, from sophisticated review prioritization and quality control enhancement to first-pass responsiveness review and defensible document reduction.

In all cases where it is utilized, CategoriX allows legal teams to zoom in on their documents of interest and deprioritize or set aside altogether irrelevant data.

Not every situation is ideal for CategoriX, though. For very small matters or in circumstances where attorneys need to find only a handful of documents to answer specific research questions, Categorix may not be the most efficient tool for them. That is why we offer other advanced search and analysis tools and services that enable clients to dig into their review population quickly to identify the important documents. Based on the scope of our clients' matters, their timelines and their budgets, we can advise them on the tools that will enable the most expedient and cost-effective approach to meet their goals.

Editor: Do clients use CategoriX on their premises, in the cloud or both? Why? Jones: Our clients use CategoriX as a service. We've found they don't want to be in the software maintenance or hosting business. As such, we host the software in our facilities, which means our clients don't have to invest in any IT infrastructure, training or the people to support it. Cost of entry is low risk, and our clients are up and running quickly.

To ensure there are no risks associated with data transfer, we make the samples and the fully CategoriX-assessed document collection available on our hosted review platform, OmniX, for attorney review. We also ensure that documents are presented and routed in a way that fits within your workflow.

Editor: Will e-discovery technology ever be able to replace human involvement in the review of large numbers of documents for relevance and privilege?

Jones: It's not man or machine. It's both. No matter how robust the technology, we believe that humans will always have to be involved to some degree in the review of large volumes of documents for relevance and privilege to ensure a quality and defensible review.

CategoriX enables senior attorneys to maximize their contributions and knowledge of the case early on - but avoid the time-consuming and costly review of non-relevant documents downstream. For example, once CategoriX has ranked the full document collection, senior attorneys will still need to review the documents most likely to be important to the case. That will never go away. However, they can deprioritize documents least likely to be relevant by sending them offshore, to contract attorneys, or setting them aside altogether. Somewhere in the middle tier is where associates may come in.

Similarly, technical expertise will be required to design and execute sampling and statistical and linguistic models, which will improve algorithms, measurement and workflow, ultimately ensuring defensibility and optimal results.

Editor: Do attorney review teams need to learn and adopt new approaches to accommodate automated document classification technologies?

Jones: No. We have worked hard to integrate CategoriX processes as seamlessly as possible into routine review workflows. So, the manual review effort required to train CategoriX involves the same protocols and platform used in a traditional document review. The main difference is that there is ultimately less manual review required throughout the process, which results in the significant time and cost savings for our clients.

Editor:So what's next for XLS?

Jones: We know there is a lot more we can do to create greater efficiencies and cost savings for our clients, and we're continuing to invest in areas where we can bring value to the e-discovery process. In the near future, clients can expect a lot more automation, including tools that can increase productivity by managing day-to-day e-discovery tasks, such as loading and producing data. We'll also continue to deliver new, practical functionality that our clients can really put to use, such as our legal hold module. There is a lot going on right now, and we'll continue to emphasize efficiencies, scalability and cost savings in all phases of the e-discovery process.

Please email the interviewee at amanda.jones@xls.xerox.com with questions about this interview.