Editor: Mr. Carpenter, would you give us an overview of Recommind and the services that it offers its clients?
Carpenter: Recommind is an enterprise search, e-discovery and e-mail management software company. All of our products use the same core platform, which is essentially a series of statistical algorithms that handle conceptual search and auto-categorization. We produced this patented technology to address the knowledge management and information management needs of our clients. The clients tend to be very large enterprises, those for whom the sophisticated and secure management of large volumes of data is critically important. The client group includes large pharmaceutical companies, banking and financial enterprises as well as professional services firms such as accounting and law firms.
Editor: What are the main risk factors inherent in email?
Carpenter : As everyone knows, the amount of information that we create and then share and store in email is immense and growing. This is not simply because more people are using email but because we are becoming increasingly reliant on email as a means of communication and as a collaborative tool. This development is taking place in the absence of any attempt at organization, whether at the time of creation, dissemination or storage, and the result is a large mass of information that, because it is not formatted, is not easily accessed or amenable to extraction. That is a serious problem for any enterprise, but particularly for one that operates in a litigious or heavily regulated environment. The immediate reaction of such an enterprise is often archiving, but unless accompanied by a careful editing and screening process that permits quick access and retrieval, archiving usually results in nothing more than the segregation of vast amounts of information of little use to anyone. There are several email collaboration tools that permit email to be dragged and dropped into various folders, but these tend to be rudimentary and pretty ineffective. Consequently, what exists today is a siloed approach to the creation of email-based information and its dissemination and storage.
Editor: I gather that email is often the main driver in escalating litigation costs.
Carpenter: Absolutely. In the first place roughly 70 percent of the information likely to be subject to an investigation or lawsuit is housed in email, and secondly this information is usually completely unorganized and unstructured. The ramifications are considerable.
Editor: What are the main concerns for corporations about email?
Carpenter: Putting email into a corporate context, the initial challenge is for the IT people. The questions here concern direct costs and extend to how email is to be managed when it is growing at such an exponential rate, or how the exchange server is going to be kept up when expectations are for email availability on a 24/7 basis.
A second challenge has to do with storage. Very often corporations implement storage limitations which compel their employees to create a personal archive - a PST file - on their laptops. That creates all kinds of problems. Trying to access information for knowledge management purposes in connection with a legal proceeding is all but impossible when the information exists in individual silos across the entire enterprise.
Editor: You are talking about files that are too big, unedited and filled with information which is often irrelevant?
Carpenter: By orders of magnitude, yes. In addition, there is no effective mechanism for de-duplicating this information, let alone retiring it. When people look at the storage of email, there is often a knee-jerk reaction that there is no problem because, in fact, the cost of storage is coming down. That is a very misleading response because the reduction has to do with direct costs - the price per terabyte of storage - not ancillary costs, including power consumption, personnel costs (including health care costs) and expenses associated with "going green." All of these developments converge to make dealing with email-based information for litigation and regulatory compliance purposes an extraordinarily expensive proposition.
Editor: You mentioned an email archive. Can you be more specific as to why that may not be sufficient?
Carpenter: An email archive is a good way of taking online data and making it semi-online or semi-offline. For business continuity or disaster recovery that makes sense. But it is not an effective way for a company to store information that can then be recaptured.
The first problem is that the archive serves to perpetuate the longevity of information well past its useful life. It also does not differentiate between information which has a current value and information which should have been retired much earlier. It all sits in one large bucket or, possibly, in a few large buckets. That makes it extremely difficult to either delete obsolete information or extract information that continues to be of value.
In addition, an email archive is not effective in facilitating collaboration. That is not what it is built to do.
Editor: Some companies are instituting short term deletion policies, i.e. 180 days. Is this the answer?
Carpenter: A deletion or offloading policy makes good sense, but I think it should be much shorter than 180 days. Something between 30 and 60 days is most appropriate. The obvious question, of course, is to what application and location do you offload. For all the reasons I have cited, you do not wish to offload to an email archive. We recommend offloading to something that is user friendly, permits collaboration and is easy to retrieve. A system of folders stored off the exchange server is one option. Anyone with the appropriate accessibility rights to a particular folder can get it, and if the system is properly organized the information stored is easily available. These sorts of automated e-filing systems are the future, in my view, and they are being driven by the real costs of information risk, of which email is the principal culprit. Consulting firms and law firms, which are dependent on customer and client files, are increasingly organizing their information, and particularly their email-based information, in this automated fashion. This is a development that is gaining momentum.
Editor: Why have organizations been so slow in getting their information houses in order?
Carpenter: It is something of an oversimplification, but a great many organizations have been focused on the direct costs of storage. As I mentioned, the incremental cost of a terabyte of storage has gone down dramatically. But the indirect costs have gone up just as dramatically at precisely the same time. This includes energy consumption, personnel costs, datacenter space, redundancy space, and so on. Only recently have organizations begun to realize the extent of the indirect costs associated with the ways in which their information has been stored to date.
Editor: How about the indirect costs of compliance and litigation?
Carpenter: We spoke earlier about the importance of retiring data that is not needed. Most people think of this in the context of getting their house in order, which is certainly important, but there are some rather large costs associated with not making this effort. Think of a large enterprise faced with investigations and lawsuits on an almost daily basis. In any given proceeding, the company may need to turn over 100 gigabytes of data in response to the other side's demands. If they have not gone through the data and retired what is irrelevant and organized what is relevant in some fashion, they could easily find themselves turning over a terabyte of information. Now, if the cost of going through a gigabyte of information during the e-discovery process is roughly $2,000, the cost associated with 100 gigabytes or a thousand - a terabyte - is not inconsiderable. A couple of cases of this magnitude may entail a very serious outlay of money, and I hasten to add that even if the review work is outsourced to, say, India or the Philippines, the costs are still going to be very significant.
Editor: What steps are proactive organizations taking to address data management?
Carpenter: There are a variety of tools that can be deployed once the organization is into the compliance process or an e-discovery undertaking, and these generally serve to mitigate the costs involved. However, the single most important thing the organization can do to address its data management issues is to act as early as possible. And that does not mean to react as early as possible. When information is created, for example, it should be stored in a fashion that will enable it to be retrieved by any employee with an appropriate need to have such access.
Among other things, that ability to access information serves to dramatically enhance the productivity level of an organization's employees. When information is effectively organized at the point of creation or dissemination, that step serves to support the intelligent de-duplication of data which, in turn, can save the storage footprint needed for any given bit of data by up to 80 percent. In addition, when information is intelligently organized and filed at the time of creation - and this is particularly relevant to email - its retrieval, whether for knowledge management, e-discovery or compliance purposes, is much faster, more accurate and more efficient, all of which saves the organization money through the entire life cycle of the information.
Proactive organizations understand that any solution to information management must be scaleable if it is to be effective. In order to keep up with the amount of data that is being created, some degree of automation must be put into place. The solution must be easy to use and deliver results that are both accurate and relevant. The solution must enhance the current workflow by giving employees a "what's in it for me" incentive. What works is a system that is scaleable to the organization, uniform across the spectrum of employees entitled to have access to the system, secure, user-friendly and relevant - capable of producing the information in the manner for which it was designed.
Editor: The question of uniformity across the organization sounds like an important feature.
Carpenter: It is. I spoke about collaboration earlier. However a company's emails are organized - by customer, by product, by a particular sales campaign - the relevant emails should be in a folder or series of folders that are accessible to anyone concerned with the subject-matter and irrespective of whether that person originated or was the recipient of such emails. This ability to collaborate within the email environment and break down those information silos is now attracting the attention it deserves.