In 1967, Wilson transformed the tennis world by introducing the Wilson T2000, the first metal-frame racket. It didn't require a vise-frame to prevent warp like wooden rackets. Until then, wooden rackets had dominated the field, not because they were such technological wonders, but because they were the only technology available. In 1976, Prince introduced the Prince Pro and the Prince Classic. These were the first rackets with an oversized head; they had almost 50 percent more sweet spot than other rackets. Jimmy Connors, however, played with the T2000 until the mid-1980s.
This history of tennis teaches us two things: technology moves on, and even professionals are slow to embrace change. This article will examine the history of search technology in the face of disruptive change, and will argue that using old search technology should no longer be considered a "reasonable" search.
In eDiscovery, the technology starts with keyword search. When I was in law school, Westlaw's site came online, and I remember forming large search strings to try to find a case. LIRR and (reasonably w/2 foreseeable) and Cardozo would (hopefully) yield the Palsgraf case, among others. Keyword search advanced not in technological complexity, but in user sophistication. It's not unusual today to see massively large keyword strings - I've heard of strings with over 250 terms. These terms are used to find documents in a corpus that are relevant to a particular issue. The development of these keyword strings isn't free: it takes significant time and effort by high-priced partners and their associates (can we call them "medium priced?") to come up with keywords with the best chance of finding relevant material.
But how good a chance is that? Both research and common sense show us that the chance isn't that good. Research has shown that keyword search leads to recall of about 50 percent at best; one study puts it closer to 20 percent. That's a lot of money and a lot of effort to only find half to a fifth of what you need to find.
But there's even more risk. From an in-house compliance perspective, the highest risk activity is making a representation to a regulator. Anything you say you have to have completely locked down. Besides the reputational risk of a misstatement, there's actual jail time in the mix because making a false statement to a regulator is a crime. In one study, however, lawyers were asked what percentage of relevant documents their keyword search returned. The lawyers in the study answered that they believed their search returned a recall of 75 percent. In actuality, the recall was closer to 20 percent. In baseball, a success percentage of .200 would give you a nice career. In eDiscovery, the disconnect between perception and reality could lead to embarrassment in court, sanctions, or even jail. The other issue recognized as important by in-house counsel and the outside lawyers they work with is credibility with the regulators. Its importance cannot be overstated. Believing yourself to be 75 percent accurate when you're only 20 percent accurate - and telling a regulator that - is a blow to your credibility that you might not recover from. Lack of credibility also leads to more subpoenas and less forgiving deadlines. It recalls the adage, "fast, cheap, or good: pick any two." With keyword search, because fast and good are now mandatory, "cheap" is off the table.
The other worry with keyword search is summed up in an attorney's question I once heard, "when's the last time you had a 'hot doc' that actually had a keyword in it?" My personal, albeit anecdotal, experience bears this out. The two worst (or bestI was a regulator at the time) emails I saw weren't keyword-laden. One was an email in a fraud case, where the issue was knowledge, and the email just said, "go ahead." The other was also in a fraud case; the email said, "thanks for the money, it spent well in the bars." Both emails were crucial to my case, but neither would have been found in a traditional keyword search. This is keyword searching today.
Reasonableness, however, isn't an absolute standard. We determine what is reasonable based on the alternatives. If I needed to send an important one-line message to California from New York, it would make sense if I sent it via email: speed and accuracy virtually guaranteed, plus high assurance of delivery to the right person. Sending it via U.S. mail sacrifices speed, and a little accuracy. And we recognize that certain important documents - even of only one line - can be sent via U.S. Mail. For example, U.S. Mail is still the standard way to send court documents (although e-filing is becoming more popular, and in some districts mandatory). But we would look askance at someone who sent it via teletype, and would call a fool someone who sent it via Pony Express. What was reasonable changed over time because the metrics of the alternatives, in terms of what Six Sigma calls the "CTQs" or those things "critical to quality," produced different results. The CTQs for the important message are speed and accuracy of transmission, plus assurance that the right person got the information. While email and U.S. mail are good alternatives, they are so far above other methods that using the Pony Express just isn't reasonable. We would have the same analysis if someone tried to use a wooden racket - or even a T2000 - at Wimbledon today.
So we have to look at the alternatives to keyword search to see if the CTQs skew toward the newer technology to the extent that using only keyword search becomes indefensible. Search technology has developed two generations beyond keyword search. The first advancement was the introduction of algorithmic search mathematics like vector space search analysis, which dealt with the frequency of the use of terms to determine "likeness." Then came Latent Semantic Indexing, which allowed "concept clustering." This is similar to a "more like this" button on web sites today. It allowed a user to identify a document and the computer, based on the frequency of words being used together, would identify documents with words used frequently with other relevant words, and cluster them together. The math behind it is, thankfully, beyond the scope of this article (because it is most definitely beyond the skill level of the author). This technology had certain drawbacks, such as the inability to place one document in two different concept clusters. Some of these second-generation search algorithms also need large, or even huge, training sets to determine relevancy. Some also don't easily handle new documents being added to the corpus mid-review, and some work better on large, homogeneous sets of documents unlikely to be found in the eDiscovery context.
Probabilistic Latent Semantic Analysis("PLSA"), patented by Recommind, was developed to address some of the shortcomings of LSI and similar algorithms. It allows for documents to be placed in multiple concept clusters, and it recalls documents that have similar concepts to those in a set of identified documents, even if the same words aren't used in the two documents. For example, it would be extremely rare for a document to explicitly use the term "bribe." A search for "bribe" using PLSA would effectively find documents that have the same meaning, even without the term "bribe." Using PLSA allows for recall percentages above 60 percent, at significantly higher precision percentages. In fact, clients using PLSA for their search, along with the Recommind's patented iterative workflow, have been able to find responsive documents with a confidence level in the 90 percent range while reviewing only 8 percent of the total corpus. These responsiveness metrics are then backed up by in-the-workflow statistical sampling that courts today are starting to require for large document sets.
This so changes the metrics and CTQs of search that it's hard to see how a technology that only gives you 20-50 percent recall with unknown precision can be called "reasonable." On the other side of the equation, I don't think we will ever completely make obsolete the human element of review. In the law, when the legal standard is inherently, if not explicitly, subjective, human beings need to have the final say. The question isn't the final say, it's the first or second say that we are discussing here. Human linear review is inefficient and inaccurate. Keyword searching is better, but its faults outweigh its benefits. Companies should closely investigate their choice. A lot of the inferior technologies are incremental, and mimic in major respects PLSA: the devil is often in the details.
What companies need is a hybrid, utilizing the best of human review with the best of concept search. It's therefore incomplete to discuss man-versus-machine. It's man with machine.
As one judge recently opined: "[T]his methodology is so much more preferable than keyword searching. I don't know what kind of an argument could be made by the person who would say keyword searching would suffice as opposed to this sophisticated analysis. That's just comparing two things that can't legitimately be compared. Because one is a bold guess as to what the significance of a particular word, while the other is a scientific analysis that is accompanied by a methodology that will meet the test of Daubert or of any other standard by which one tests a scientific methodology."
If this is a judge's opinion off the bench, it won't be long before similar language will appear in a decision. At that point, keyword searching will be equated with the Pony Express and wooden tennis rackets: technology that was great in its day, but just doesn't cut it today.
Howard Sklar is Senior Corporate Counsel at Recommind, Inc. Mr. Sklar represents Recommind to corporations and law firms. Prior to joining Recommind, he was Global Trade and Anti-Corruption Strategist at Hewlett-Packard Co., running HP's global anti-corruption compliance program and providing counsel on compliance with U.S. sanctions laws. Before HP, Mr. Sklar was Vice President, Compliance and Global Anti-Corruption Leader at American Express Co.