Editor: In your experience, what is the first problem that legal professionals have in dealing with electronically stored information (ESI)?
Roitblat: When lawyers come onto a new case, they’ve often already identified the legal issues and the main players. However, they don’t know other facts – such as the language their client’s employees use, other custodians they should be collecting from, the location and kind of information they’ll be working with, etc. I like to think of the e-discovery process as a series of loosely linked and ordered stages that start more or less with collection and continue through a phase of exploratory data analysis and maybe some predictive coding, followed by some confirmatory analysis including tagging and quality control, eventually ending in production.
Editor: What are some approaches lawyers should take to understand the data?
Roitblat: They should let the data speak to them. This means they should use tools that will help them see what’s in their data. For example, as I’m sure you know, people don’t always spell things correctly, so a tool that provides you with commonly used incorrect spellings for a keyword will help you find additional relevant data. People don’t always use the same words all the time, so a tool that can find a way to identify the concepts in a document without having to know exactly what words were used will also help you explore your data. Another useful tool would be one that gets the data to tell you what’s interesting in a document and what makes that document stand out. Such a tool might also help you identify new paths to finding the information you want.
A few years ago, Judge Peck talked about the use of keyword searching as a game of Go Fish. There’s a lot of truth in that because it’s nearly impossible to guess exactly the right words that people will use. We use two linguistic terms to describe this: synonymy and polysemy. Synonymy refers to the fact that there are many ways of saying the same thing, and polysemy refers to the fact that the same word can have many different meanings. That’s why keyword searching is both over-inclusive, because people use a particular word in many different ways often without noticing it, and it’s under-inclusive, because they can say the same thing in many ways. If you don’t have the means to identify those different ways, it’s very difficult to follow the path of creative thinking, which is essential when crafting your litigation strategy.
Editor: How do you translate those approaches into technology that can address this challenge?
Roitblat: We build language models that represent the mathematical properties of the language used in the documents. They help us to identify how words are related to one another and where the meaning comes from to make those words. Take our OrcaTec Concept Search tool. We look at how words are used in the company of other words, which helps us to understand the meaning. If I use the word “court,” you don’t necessarily know which “court” I mean. If I say, “court (blah, blah, blah) judge,” you know which meaning of “court” I’m using, as opposed to if I were to say, “court (blah, blah, blah) basketball.” The words around a word partially define it. Language models incorporate all of that and make it possible for you to be able to identify relevant documents – not just by whether they have a keyword in them but by the meanings they have, as identified by our language model.
Concept Search not only provides these contextual matches to the user, but it also helps the user identify related terms that he or she might not think of. There are number of ways that can happen. We have a capability called “Interesting Phrases” in which the computer identifies what’s statistically interesting in a document and then presents this to the user as possibly relevant information.
Editor: What are important attributes of technology that best help legal professionals address this challenge?
Roitblat: Visualization is critical. Words are wonderful, but words alone can sometimes obscure relationships and meanings. Our OrcaTec tools provide visualizations – graphs, maps, etc. – that enable you to see relationships between people and between documents and words that are difficult to see if you’re just looking at the words directly. Our tools allow you to take a step back, to view your information from a higher level of abstraction, just as viewing the Earth from a higher altitude lets you see the geographical features of the environment. You can see the forest, while on the ground your view may be obscured by all the trees.
Take social networks, which can be an important avenue into discovering who’s talking to whom, and that therefore can be used to identify new custodians. If certain individuals are appearing in the emails of a person of interest, then those individuals’ data can be collected in a relatively focused way. The search can then be expanded, if need be, in a highly defensible fashion because the e-discovery team will be able to easily describe how they decided which custodians they actually collected from and which custodians they actually produced. Readers who are interested may request a demonstration of OrcaTec's tools here.
Editor: What should be the goal of these enabling technologies?
Roitblat: To help lawyers be lawyers. These days there are so many obstructions to intelligent legal thought. Years ago, if you had a box of paper, you could sit down and go through it in about an hour or two and be done. These days, with millions and millions of documents, that’s just not possible. Legal technology should enable lawyers to get back to that scenario in which they actually had direct access to the information – when they could lay it all out on the table and look at it.
Of course, there was a stage when cases involved whole warehouses of paper. In one of the first cases I worked on, around 2000, 13 million paper pages of email were produced. One of the reasons they wanted it on paper was that they knew the other side didn’t have the resources to go through it all. The first question the receiving side asked was, “How are we going to sort these to date order?” The answer in those days was that you rented a warehouse with lots of folding tables and you wandered around sorting documents. With electronic data, 13 million pages is no longer a big deal or case: we have tools that can sort them in a matter of seconds.
Editor: If these tools are used early on in a case, might they help lawyers build strategy or even determine the direction of a case?
Roitblat: I think these tools are essential from a strategic perspective. As I’ve said, the lawyers on a case already know some of the players and custodians they need, but they don’t necessarily know with whom those people communicate. For example, emails are not always sent by the actual author of the email: people still have their secretaries send and read emails for them.
Furthermore, outside counsel may not know who in the organization knows what. An organizational chart may indicate that so-and-so is the expert on a subject, but if you look at the content of the information, it turns out someone else is really the expert – and that’s the person you should be talking to. All of our solutions are intended to make it easier for lawyers to get a handle on the information they need. It doesn’t push them to do things: it enables them to do things.
Editor: What is your experience in developing these technologies?
Roitblat: I’ve been trying to make life easier for lawyers since about 2000. Unfortunately for the legal system, the volume of documents has grown faster than we can deal with it. In 2000, a gigabyte was a lot of data, and nowadays a gigabyte is probably not worth getting out of bed for. We’re dealing now in many terabytes of data. Basically, volumes have become so huge that they are on some level beyond the ability of people to really comprehend.
Our goal is to create technologies that make information visible that would otherwise be lost in a pile of papers – to help lawyers find the information they need to formulate the case. I can’t tell you how often I hear from lawyers, “Well, we started off thinking about it this way, then we found these documents, and now we’re thinking completely differently.” Under more traditional approaches, there’s no time to go back and re-review earlier documents that a lawyer didn’t know she needed. Exploratory analysis is really a knowledge process, which is why I think fairly senior people should be conducting it: it’s how they learn what’s really in the data.
Unfortunately, for the last 20 or 30 years, document review has been used as kind of a hazing of new lawyers, and so there’s a tendency for more senior lawyers to say, “I’ve already done this. I paid my dues, and I don’t want to do that again.” However, if using these tools gives you a sensible handle on what you’ve got, what you’re doing and where you’re headed, it’s worth the effort. By identifying evidence that’s relevant, technology can help you learn about the issues in the case on a deeper level, not just at the legal level.
Editor: And what is your experience in the practical application of these technologies?
Roitblat: People have been using them very effectively. At my former company, DolphinSearch, we introduced concept searching around 2000. It took about seven years for concept search to make its way into the mainstream such that every RFI asked about it. Predictive coding made a splash in 2010, and by the middle of 2011 it began to appear regularly in RFIs from companies.
Users are seeing the value in these tools as they are more frequently applied. We worked on one matter in which the team used our tools, and they went from being sued for half a billion dollars to actually getting half a billion dollars from the other side, thanks to evidence they found through this exploratory analysis.
Editor: Why do you think lawyers should consider direct participation in exploratory analysis?
Roitblat: It’s during exploratory analysis that lawyers actually learn what they have to work with – what the meaning of the documents is, where the evidence is, who the real sources of information are – rather than who they’ve been told the sources of information are. That’s when they can actually build their case. Eventually, they’ve got to be able to think strategically – as lawyers – about where they’re headed. The more information they have, the better informed they are about both what’s positive and what’s negative in their data set, the more they can reason about their case, and the more power they have to push their cases forward.
Herbert L. Roitblat, PhD, is a co-founder of OrcaTec LLC (CA), an award-winning professor of experimental psychology, and a widely recognized expert in search and retrieval technology, particularly in the area of eDiscovery. In addition to his scientific work, Dr. Roitblat has been writing extensively about the problems of dealing with massive amounts of electronic data and the emerging standards for dealing with those problems.