At ACM/MEDES 2013, UBIC Data Scientist Presents Article When To Safely End Document Review Process

Tuesday, December 17, 2013 - 11:02

UBIC, Inc., a leading provider of Asian-language eDiscovery solutions and services, has announced that UBIC data scientist Jakob Halskov demonstrated the company's Quality Monitor and Endpoint Detector software to a forum of computer engineers and data scientists at the Association for Computing Machinery's (ACM) prestigious Management of Emergent Digital EcoSystems (MEDES) Conference in Neumunster, Luxemborg. The MEDES Conference was held in late October of this year. UBIC's Quality Monitor and Endpoint Detector software indicates when a legal document review process can be safely concluded at an acceptable level of quality.

The annual MEDES Conference, established in 2009 by the ACM in association with the IEEE, formerly the Institute of Electrical and Electronics Engineers, is a recognized, industry-leading forum for the world's top scientists and engineers involved in computer technology development and application from academia, research laboratories and private enterprise.

Determining Useful Endpoints In Machine-Based Discovery

At the MEDES Conference, UBIC's Jakob Halskov presented his research article, co-authored by UBIC's Hideki Takeda, titled "When to Stop Reviewing Documents in eDiscovery Cases: The Lit i View(TM) Quality Monitor and Endpoint Detector." Mr. Halskov's article and demonstration of UBIC's software relates to the determination of a reliable end point in the review of documents by machine-based data analysis software, commonly referred to as "predictive coding" software. Predictive coding is an industry term for machine-based, statistically driven search for specific data from within what is typically a very large volume of otherwise unqualified data. The article by Mr. Halskov and Mr. Takeda is available in the ACM Digital Library found at http://dl.acm.org/.

In UBIC's industry, this machine-based search is most often associated with legal discovery, which is the search for, and compilation of, evidence prior to litigation as directed by a court of law. UBIC's predictive coding software, called CJK-TAR(TM) for Chinese-Japanese-Korean Technology-Assisted Review, is used to search for and identify documents containing specific information - texts, tweets, emails, presentations, etc. - either as a stand-alone review process or as a supplement to human review. UBIC specializes in the handling of data and documents in certain Asian languages as well as English.

In the legal industry, the relatively recent introduction of predictive coding technology to the discovery process in litigation has raised questions about how to reliably determine when the automated review process (as well as any human review of documents) has reached a point where it has can be said to have successfully identified all the items sought from within a particular volume of data. Understanding this point and knowing when it has been achieved (and when it has not) is the subject of the article and UBIC's demonstration at MEDES.

Maintain Quality, Reduce Cost

In legal discovery, the review process is typically a high-cost undertaking, particularly when human reviewers are involved. The purpose of the research and the demonstration by UBIC's data scientists is to show that the cost and quality of a review process for evidence can be definitively monitored in real-time. Mr. Halskov's article provides guidance suggesting when additional effort will produce fewer and fewer results, thus indicating a point where a review process can be ended.

UBIC's Quality Monitor and End Point Detector software reliably monitors quality and determines the stage at which review, either machine-driven or human, can be stopped with high degree of confidence that additional relevant data will not be discovered with further review. This capability is a key to maintaining a high quality quotient while also acting to contain the high costs of review.

Machine-driven data analysis technology such as UBIC's is also applied within a wide array of search and identification routines undertaken outside the legal industry. In fact, a variety of enterprise- and research-based applications make use of this class of analytical software including scientific data analysis, historical and real-time compliance monitoring, financial analysis and automated trading as well as consumer market research and sales.