It is hardly going out on a limb to say that the handling of electronic data discovery is fraught with complexity. And that's likely to increase when one considers the leading edge issues, such as corporate retention policies, data residing in remote or obscure places, dealing with information lodged in document management systems, or the implications of the Sarbanes-Oxley Act. But while these issues may loom, it's the nuts-and-bolts basics of handling electronic discovery that law firms are wrestling with on a daily basis - the issues of how data is processed and delivered to support the work attorneys must undertake to review and produce that data.
Coincidentally, the electronic discovery service industry, still in its infancy, is struggling with the basics as well. So it's no surprise that some law firms remain uncomfortable with the basic issues of processing electronic discovery, and from one vendor to the next, law firms can experience a widely varying grasp on the same basic issues.
Considering this, law firms need ways to exercise greater control over the handling of their electronic discovery. And establishing that control begins before a single byte of data is pushed through any process at the vendor site. This is when a firm gets maximum mileage out of being proactive in its discussions with the vendor about nitty-gritty processing issues. The goal during this initial discussion phase is to demystify the vendor's "black box" by asking the right kind of questions. And following are some questions that can be particularly useful.
Completeness Deserves Pride-Of-Place
Of course, for many reasons firms should ensure that the vendor has handled every file. How do you monitor a vendor's compliance with their commitments in the area of completeness? By taking advantage of reports. It pays to find out what reports a vendor can provide, and when they can provide those reports. Then, at the agreed upon intervals, hold them to their reporting commitments.
Here are some useful report expectations:
Upfront: A vendor should be able to provide a snapshot of the content of the data collection - in essence, a "preflight" report generated before any conversion or outputting is undertaken. A good preflight should tell you:
The number and identity of e-mail boxes
The loose file count
The number of e-mail messages
The variety of file types encountered as e-mail attachments or loose files
The count for each file type
The number of duplicate e-mail messages and loose files encountered
With this type of report, a firm can get a true handle on what's about to be dumped into the vendor's processing hopper - before that happens. Armed with this information, a firm can:
formulate cost projections for its client
plan for adequate staffing
anticipate handling issues for proprietary or special files
make time- and cost-saving decisions about the processing stage
Post-processing: The firm should also require the vendor to submit a post-processing report that provides a full accounting of files. Within the report, any files that have not been processed should be subcategorized as to reason (corruption, system files, etc.).
Avoid Taking File Types At Face Value
In a typical electronic discovery workflow, files are processed based on their "type" (e.g., Excel spreadsheet, Word document, Acrobat PDF, plain text, etc.). For example, an Excel spreadsheet entails myriad processing issues that are distinct to its type, and proper handling of that file depends on a vendor's ability to recognize that it's an Excel spreadsheet, and to then process it correctly as such.
How does the vendor determine a file's type? It is important that vendors avoid making that determination based solely on the file's extension (e.g., XLS for Excel spreadsheets, DOC for Word documents, PDF for Acrobat, and TXT for text files), for the file extension is not a sacrosanct indicator of the file's type. Word documents, for instance, do not have exclusivity on the DOC extension. Therefore, it is presumptuous to handle every file with a DOC extension as if it's a Word document. Perhaps the most damaging effect of taking file types at face value is that which occurs when all "exe" files are handled as program files to be excluded from processing. This ignores the fact that some collections include "self-extracting" compressed files, which carry the "exe" extension. Obviously, if a vendor drops all "exe" files from the process out of hand, the contents of self-extracting compressed files will never make their way to the delivered product. The corrective: a firm should establish that its vendor determines a file's type by inspecting the actual header contents of the file.
Get Control Of Microsoft Office File Types
Of course, the Microsoft Office Suite file types (Word, Excel, PowerPoint) predominate in most collections. These files present a particular set of challenges that, coupled with their sheer abundance, provide good reason for a lot of interchange between client and vendor.
What options are available for the processing of Office types? With each of these file types, there can be a dichotomy between the public or printed aspect of the file, and its hidden content. For example, tracked changes, revision history, comments, or speaker's notes may lie beneath the face of these documents, only to be revealed upon beckoning. Some firms, depending on the matter, may require these hidden features to be output along with the primary content of the file, or not. Accordingly, it's crucial that the chosen vendor be able to provide the "reveal or not to reveal" option.
With Excel spreadsheets, there's even more to anticipate. Does the vendor provide the option of formatting or not formatting the spreadsheet during processing? Meaning, can the spreadsheet be rendered either as the last user formatted it for printing, or, instead, with all of its content revealed? Can the vendor ensure blank-page elimination? Processing spreadsheets for electronic discovery can result in the outputting of a vast quantity of blank pages, which increases the size of the output collection and the cost of the project. Vendors should be able to avoid this condition, and to avoid it accurately. How does the vendor maintain legibility in the output? Proper handling of Excel spreadsheets carries the responsibility of determining whether the output is likely to present a font setting too low for legibility's sake, and then juggling the font setting, and scale-to-fit and page size options to attain output that is comfortable for attorneys to read in a heads-down review setting.
Anticipate Issues With Lotus Notes
Lotus Notes presents some interesting complexities for electronic discovery vendors, and it makes sense to understand how a vendor will handle those complexities. For example:
If the vendor's process depends on first converting a Lotus Notes e-mail collection to Microsoft Outlook PST, how does the vendor preserve those Notes folders that don't map to standard Outlook folders? Can the vendor account for all Notes objects in the PST output?
Can the vendor process Lotus Script documents so they output as the document was intended to appear, rather than in code form?
Does the vendor know how to process attachment repository databases?
The foregoing questions, while touching upon just some of the issues surrounding the processing of Lotus Notes, should prompt answers that will give you a good feel for your vendor's acumen with Notes.
So all in all it pays to introduce some transparency into that "black box." For one thing, you will attain better control over how your data is handled. For another, increased control in that area will have a positive, sweeping impact on all other litigation management tasks that involve your touching that data - whether it be review, production, or ongoing research and retrieval.
Many firms can attest to being disillusioned with the product they receive even when dealing with established vendors. The caveat litigator is this: Leave nothing to chance. Don't assume that a vendor will deliver the best possible product without your having stipulated the requirements for that product. There is no standard in this industry - except that defined by the client.
Chris Hansen is a Senior Consultant with Alpha Systems in Huntingdon Valley, Pennsylvania. Questions about this article can be addressed to him at firstname.lastname@example.org. A version of this article first appeared in the New York Law Journal on February 2, 2004.