Throughout every phase in the electronic discovery process, it is inevitable for counsel to encounter the proverbial fork in the road. Making the wrong decision can have many repercussions - from missing critical deadlines and higher than expected electronic discovery fees to influencing the outcome of a case. The processing phase of an engagement can be particularly harrowing for those not fully informed about the potential issues that may arise. By taking a proactive approach during this phase, counsel can eliminate much of the risk and complexity often associated with data processing.
Along with the fees associated with the attorney document review, data processing is among the most expensive components of an electronic discovery engagement. The data processing phase ensures that all documents are processed in a manner that will allow for a well-managed attorney review. Data processing encompasses three phases, all with their own unique set of potential challenges. The initial phase, discovery, is when data is enumerated and indexed. The next phase involves extracting and processing the data. In the final phase, data is exported to a desired review tool format.
The discovery phase is often considered the most critical. During this phase, every file that is to be processed must be properly accounted for. In matters involving e-mails, this phase entails extracting all e-mail files out of the PST archive as well as out of the e-mail attachments, such as Word documents and Excel files. Parent-child relationships are established during this process. The original e-mail is labeled as the parent and the attached file is the child. This process facilitates the attorney review of e-mails by establishing the origin of the files.
At this phase an index is also built for every file. All searching is performed against the index. Indexing significantly increases the efficiency by which files can be searched. When an index is produced, the raw text of all files is compiled into one comprehensive text index that can be rapidly searched. Without an index, the time and cost of performing keyword searches across massive amounts of files can be overly burdensome. This technology allows users to conduct a search across an index and can take seconds, whereas the same search performed against the actual files could take several hours.
The second phase, the extraction and processing of data, compartmentalizes the component parts of each file into metadata fields. In essence, e-mail files are broken down into fields that include sender, recipient(s), subject and body of the e-mail. Any e-mail attachment is treated as a separate file and independently compartmentalized into its own set of metadata fields.
The file relationships established during processing can be quite complex. For example, if an e-mail file has a zip file attachment, the zip file is considered a child of the original parent e-mail file. However, due to the fact that a zip file contains files within itself it would have its own children, which would become grandchildren to the original parent e-mail.
At the processing phase TIFF images are generated for each document. Typically, TIFF images are created on a per page basis; however they can also be generated at the document level. Obstacles can occur when generating TIFF images related to specific document types, such as Excel or PowerPoint. For example, the reviewer must determine what scale to print the documents at; whether or not to print the column and row headers in Excel; or whether or not to print the notes and comments section of PowerPoint presentations. These types of options help determine how the TIFFs will look for review and production.
The final issue that must be addressed related to TIFF images is conducting a quality control check. It is in a company's best interest to have its electronic discovery provider perform a quality control review before turning over TIFF images for attorney review. Due to the constraints of existing technologies, it is typical for a percentage of TIFF images to not render properly. This problem most frequently arises with Excel spreadsheets. In such instances, a spreadsheet may print without all of the columns and rows that appeared in the original document, which can result in a large, one-page Excel file to print many pages, making the review exceedingly difficult.
An important differentiating factor exists in a native file review that can work in counsels' favor and is important to consider. With a native file review, TIFF images are not generated and a quality control review on the documents is not required. Because the native file is the original document and can be accepted at its face value, processing occurs at a much quicker rate.
In the last phase of data processing, data is exported to a desired review tool format. If the review involves native files, then the export process will generate a single load file that contains a record entry for every native file and all of its associated metadata. The load file also contains document relationships such as parent-child and a link to the native file itself. Next, the native files are gathered and built into a structure that is compatible with the load file generated so that when the native file link is selected during review, the proper native file appears.
When the reviewer opts to use TIFF images rather than native files, two load files are generated. The first load file contains a record entry for each document and its associated metadata as well as parent-child relationships. A second load file is generated that links all TIFF images to each record in the first load file. These two load files are then delivered, along with all of the associated TIFF images, to the reviewers so that the files can be imported into a selected review application.
Processing electronic data can be an extremely expensive task; however with a thoughtful strategy in place from the outset, counsel can effectively manage processing costs. Expenses for this phase can be mitigated by instituting a few simple measures: determining a relevant date range for the data collection; establishing a concise search term list to limit the production of unrelated and/or irrelevant documents; and developing deduplication strategies.
Jerry F. Barbanel is the Executive Vice President in charge of IT Risk and Litigation Consulting for the Financial Advisory and Litigation Consulting Services practice at Aon Consulting. Mr. Barbanel can be reached at (201) 966-3494. Thomas W. Avery is a Senior Director in charge of the Electronic Discovery group at Aon Consulting. Mr. Avery can be reached at (949) 608-6424.