E-Discovery: Avoid Over-Preservation And Reduce Risk Of An "Unforced Error"

Tuesday, October 4, 2011 - 01:00

The Editor interviews Howard Sklar , Senior Counsel, Recommind Inc.

Editor: How does the enforcement environment play into preservation obligations?

Sklar:The preservation obligation is a fact-based determination, but it's very difficult to predict how a court will come out in any situation. The simple definition is that the obligation to preserve documents occurs when litigation is reasonably anticipated by the corporation. The rub is that in any large corporation identifying the time that the corporation came to that conclusion is challenging, because it would have to be reached by a person whose knowledge could be imputed to the corporation.

A common reaction is that since you may never know when that obligation is going to attach, you should save everything. The simple fact is that companies don't have to preserve everything, but there is something risk averse in our nature that causes us to over preserve.

That attitude was not a problem back in the day when a legal matter required only a couple of cartons. In our petabyte world unnecessarily preserving that much data creates two major problems. One is cost. IT will tell you that running the servers required to cope with the accumulation of data is expensive. Problem number two is that there is no automatic connection between preservation and use. If the tapes are in a warehouse somewhere, they are not readily accessible. Moreover, they represent pure, unadulterated, unfiltered, unalloyed risk that you will have to access them at great cost.

Corporations don't need to do this. It's an "unforced error" (I like tennis analogies) because they don't need to keep this stuff. Outside of regulatory environments, there is no obligation to keep this stuff unless litigation is reasonably anticipated.

Companies need to have a good data retention policy and a method of suspending that policy as to relevant documents. A data retention policy is one of the most common euphemisms that we will ever hear because when you say data retention you're talking about data destruction. You don't need to have all of this pure risk sitting out there when courts are perfectly happy to say, "Go ahead and destroy it, it's not a problem."But it is a problem when you do it ad hoc, and it's a problem when you do it after litigation is reasonably anticipated.

Editor: Don't some of the drastic spoliation remedies frighten companies into an overly cautious approach?

Sklar: The biggest risk is a "case-killer" spoliation sanction that shifts the burden of proof or otherwise turns a winning case into a loser. The courts are all over the place as to what spoliation sanctions are appropriate. I understand the company's fear, and it's a rational one. However, I am not aware of any decision where a company had significant sanctions imposed for the destruction of a document pursuant to a rational and effective document retention policy, which provided for timely destruction of documents, except for those relevant to reasonably anticipated litigation.

Spoliation is the obligation not to destroy evidence. It's a common law obligation that you owe to the court. So in situations where that obligation has not attached, you haven't done anything to warrant sanctions so the first question you have is, "Is litigation reasonably anticipated on the subject matter of the document that is being destroyed?" Problems arise from two sources. One is where companies don't know what it is that they're destroying. The other arises where a company lacks a document retention policy or fails to apply it.

A document retention policy has to be implemented just like every other policy.Companies need to show that employees were trained to comply with it, that they kept records of the training and informed employees about their responsibilities through periodic communications. They must be prepared to show they implemented their policy at the people, process and technology level.

If they can show all of this, you can't say that a company is immunized from spoliation, but it's going to be an uphill battle for an opponent to claim bad faith spoliation which is the kind of spoliation that leads to the case determinant sanction. Bad faith case determinant spoliations just simply don't occur in the face of good faith reliance on a properly implemented and operationalized document retention policy.

Editor: The importance of monitoring employee communications has come to the fore given the FCPA and the enactment of the UK Bribery Act. Does that kind of activity militate against the position that you can, after a certain number of years, dispose of documents since earlier communications that may not have seemed relevant at the time can become relevant in the context of later communications?

Sklar: On the contrary, monitoring if properly conducted can assure that those earlier communications are preserved.Monitoring and using advanced search technology as a tool for monitoring has two benefits. The key benefit is that it allows a company to better understand its own risk and the other is that it enables a company to establish its good faith and get the benefits of timely self disclosure by notifying the Department of Justice or the UK enforcers promptly of a possible violation.Using advanced search techniques in monitoring is a very good story to tell.

To your question, such search technology can help a company avoid destroying possibly relevant documents in the course of implementing its document retention policies. The company can state that it has a process in place where before data gets destroyed it is sampled to make sure it is not relevant to an open matter or subject to a litigation hold. It has a great story to tell the enforcers or a court about having an overarching data management strategy and engaging in continuing efforts to enforce it.

Editor: Recommind has content categorizationsoftware. What advantages does it provide?

Sklar: The key to proper preservation is understanding what your data is. Otherwise, you have a very difficult time determining when to destroy it under your document retention policy. You need to know if it's relevant to active litigation or litigation that can be reasonably anticipated. That is where content categorization really comes in. Content categorization allows you to categorize data at the time of creation. There should be a lifecycle to data the same way there is a lifecycle to human cells. There is a concept of planned cell death whereby a cell dies within a certain period.

Document retention policies rely generally on human categorization, which is unreliable. People are inconsistent in their categorization choices. With Recommind's content categorization system a document is categorized at creation so that you can apply your document retention policy to that document, subject of course to changing that categorization if circumstances change. Policy can then be implemented against that backdrop, and that becomes a great story to tell a court.

Editor: Does Recommind offer the ability to identify data that for policy reasons may need to be kept for longer periods (such as that relating to the safety of pharmaceuticals or aircraft components)?

Sklar: That is the kind of risk analysis that every company has to do for itself.The key thing is that you have to understand what you're keeping and why you're keeping it.The beautiful thing about the courts is they rarely will second guess a business call as long as it's made in good faith.

Editor: How does your patented process for identifying relevant documents play into this?

Sklar: There are two patents. The first is a patent called probabilistic latent semantic analysis (PLSA), which is the name for the algorithm that allows the software to get to know a document on a very deep level. The second patent is on the predictive coding process, which is technology-assisted review whereby a seed set of documents previously determined to be relevant tells the software to go review documents and bring back more that embody the concepts in the seed documents no matter what the specific language is. The seed documents tell it to look for documents that contain the concept of bribery even though they do not include specific words like "bribe" or "grease."

You can feed the software a set of documents. It chews on those documents and brings you back relevant documents from the petabyte or more of documents in the corpus. Keyword searching by contrast provides 20-50 percent recall at best.I'm not saying to abandon keyword searching, but it is sugar cereal, which isn't a healthy breakfast long term. Document identification based on concept searching produced by PLSA ensures that you are preserving the greatest percentage of relevant documents as you possibly can. That is how you avoid spoliation.

It's an entirely different thing from story A, which is going into court and saying, "Well, judge, we used a tried and true method and preserved 20 percent of what we were supposed to." It's much better to be able use story B to say to the judge, "We used keyword searching because that's the de facto standard, but that wasn't enough for us. We also used concept searching to ensure that we got better precision and recallusing patented probabilistic latent semantic analysis (PLSA), the most advanced concept searching on the market today, to really dive into those documents to make sure we were preserving everything."If, notwithstanding this level of care, a document gets destroyed, I would rather have story B than story A in the face of an allegation that a document got destroyed.

What is happening now is that under the whistleblowing provisions of Dodd-Frank, whistleblowers are coming to the DOJ with stacks of documents. Included in the form that whistleblowers fill out is a request for the documents that they have and for information about where further documents can be found. The danger here is in doing a production to the regulator where you fail to produce a document that they already have. That is a disaster scenario.

To avoid that, a corporation would want to use the best available software with the most sophisticated technology.Recommind's PLSA-based concept search was designed to rectify some of the limitations of other algorithms. There are a number of other algorithms. A common one is called latent semantic indexing. It is like concept clustering, but it has some drawbacks. For example, it has a problem with the addition of new documents in the middle of the process. If there are two key subject matters within a document, it doesn't handle that as well. PLSA was developed specifically to address the problems of latent semantic indexing and some of the other algorithms that preceded it. I would want to use the most advanced search technology out there because it enables you to tell the best story to the court or to a regulator in the face of an error.

Editor: Do humans still have a role to play?

Sklar: Absolutely. Maybe at some point in the future there will be technology that is truly a replacement for humans, but our technology makes human beings more effective.There still is a place for humans in the creation of the seed set to train our software. Our clients use our predictive coding to select the documents that have the highest probability of being relevant and send those documents for review by people - sometimes by several levels of reviewers.Predictive coding allows our clients to focus their review on a small percentage of the millions of documents, because in every case a large percentage does not have to be reviewed.

Please email the interviewee at howard.sklar@recommind.com with questions about this interview.