Workshare Offers Latest Advancements In Metadata Removal

Monday, January 3, 2011 - 01:00

The following report is based on a webinar with the above title presented December 13, 2010 by Workshare. The presenter was Chia Ling, Product Manager for Workshare's server solutions, including Protect Server, Compare Server and OCR Server.

This webinar, "Latest Advancements to Avoid Metadata in Office Documents," addresses the identification and removal of metadata from Microsoft Office documents and how Workshare metadata management tools can provide automated and efficient solutions to legal counsel and corporations. Our theme for this presentation is the Art of Metadata.

Know Thy Metadata, Know Thy Enemy. A Thousand Documents, A Thousand Days Headache Free.

In general, metadata means "data about data," and it comprises information stored within electronic documents, both visible and hidden. When documents are sent electronically, the imbedded metadata may not be intended for viewing by its recipient. The following will discuss metadata in Microsoft Office documents, including track changes, comments, properties and macros.

Starting with Microsoft Word 2010 ("Word"), simply right click the document and select "properties" to view statistics, such as when the document was created, modified and last accessed. The "details" tab shows more interesting metadata: author names or a document title that may differ from the file name. The creation date could be years before subsequent authors use the document - in itself a potentially unintended divulgence. Thus, a document may contain information and hidden text, of which you may be unaware,

There are numerous forms of visible metadata in a Word document, such as those visible in the "track changes mode." Depending on the "view" selected, a document displays vastly different information. Selecting "final" version hides track changes or comments, though they remain in the document. Selecting "final markup" view displays edits, including text that was purposefully deleted. This is where problems arise, because all viewing options are available unless specifically eliminated.Using "final" view, reviewers may believe that there is nothing left but what they intend to send; however, recipients can switch to "final markup" and easily view deletions, comments and edits. Senders must be aware of visible metadata and ensure that all unintended information is identified and completely removed.

Hidden metadata types are more difficult to track, such as text that highlights in gray when the cursor touches it, though otherwise it appears identical to surrounding text. This indicates that a "field code" was inserted, perhaps containing confidential information. Field codes are easy to remove, but first you must identify them and understand the full measure of their existence within the document. Simple deletion does not always solve this problem. For example, the underlying macro may preserve confidential data and can be accessed freely. Third-party templates and custom templates created by consulting firms may contain hidden metadata, such as macros, that store privileged information. While most macros are harmless, this discussion underscores the need to understand your document and remove unwanted metadata at all levels. Most importantly, it shows the ease with which metadata can be discovered.

Further, properties may be stored as metadata by a document management system, third-party document repository or content management system. Such systems often store custom properties to track documents as they leave the repository, including metadata that you may not want to send.

There are some easy ways to remove unwanted metadata. For example, you can "accept all changes" or "reject all changes" and then save the file, permanently removing all track changes. Embedded comments can be deleted individually. Word 2010 offers automated options, such as "inspect document," but the software is unclear as to what it considers hidden and certain macros may remain. In short, there is no software-driven option that completely eliminates the need for manual review.

Shifting focus to Power Point ("PP"), there are some unique metadata, such as speaker notes and hidden slides. Speaker notes are meant to assist the presenter but are not necessarily to be shared. Slides can be hidden when they are irrelevant to a certain audience, and you may create extra slides to buy more time or to offer additional information to one audience versus another. While hidden slides are shown only in edit mode, they are there just the same and can be discovered. Again, unwanted data can be removed manually, and PP also has the "inspect document" option, but the same limitations apply and thorough review is never completely circumvented.

Like Word and PP, Excel has common metadata types - "document properties" and the ability to create comments - but Excel also has unique metadata types, such as hidden worksheets, columns and rows. I will explore how these latter items manifest in an Excel spreadsheet. Small indicators, like red triangles, denote that there is a comment, one you may not want to share.Another way to detect hidden information is to note breaks in the alphabetical presentation of columns or numerical presentation of rows. Obviously, if column "D" precedes column "F," then column "E" is hidden, and it is easy to unhide this column. Just as with PP, hiding is not the same as deleting. Some hidden information, such as calculations, are purely practical and safe for sending along, but be aware and make a choice about including this information in final versions.Entire worksheets can be hidden by one user and then restored and renamed by subsequent users, giving the impression that they were there all along. Thus, it is important to understand the ways that metadata are used in Excel, just as in Word and PP, in order to work effectively and securely with these documents.

Whether you work for a legal organization or a corporate firm, metadata is becoming more and more important. The U.S. FRCP defines what is admissible in a court of law and specifically allows electronic documents. Metadata can be submitted in e-discovery, including comments and other hidden data; moreover, this is a global development. Many countries are enacting rules and regulations around metadata types, and international organizations need to be aware of this issue in their worldwide operations. For example, Australia, via Practice Notes CM 6, and the UK's Civil Procedure Rules at Practice Direction 31 provide specific guidance on metadata. I caution all to resist the urge to wonder if all this applies to your organization, as the following examples will illustrate.

None of these cases involved a court of law, but were public matters of an embarrassing nature. One of my favorite examples is Google and involves their policy of not providing financial guidance in presentations to the finance industry.In 2006-2007, Google provided a PP document in advance of a live financial presentation, and the speaker notes contained financial guidance information. Even though the live presentation did not contain such guidance, Google had to make a rather delicate public statement about how financial guidance contained in the PP document did not expose anything they would not have shared through the presentation. This embarrassing moment for Google easily could have been avoided.

Another example of dangerous metadata leakage involves the 2008 Barclays buyout of Lehman Brothers. The legal firm for Barclays worked on an Excel document in which hidden columns and rows contained contracts of which Barclays was unaware. After the document was submitted, someone exposed the hidden information and questioned whether this was part of the buyout. Of course, Barclays asserted they did not want the hidden contracts, but the exposure created many headaches and a very testy time for all.

What does metadata mean to us today? Electronic communications, such as email, are now being sent via mobile devices as well as from traditional desktop and laptop computers. Managing metadata manually cannot be accomplished via mobile devices, which present special challenges and are ubiquitous. Microsoft Windows has a new operating system for mobile devices.Google has a new phone and operating system, and they will offer tablets for the Android operating system. Tablets are replacing laptops and desktops, further evidencing the trend toward increased mobility. While the key function for these devices is sending email, they do not have software necessary to address metadata, and it is difficult to keep pace with technology to protect intellectual property and other non-public information. Some metadata are harmless; therefore, the key is to know your document and deal with it accordingly.

To address this issue, I will present best practices that are enforceable. While legal firms and corporate legal departments are champions for protection on the metadata front, this issue has much broader reach. It affected Google's finance team when they created a presentation for a public company.It affects bankers in managing acquisitions or buyouts, as we saw in the Barclays example. There are many ways to send information, including computers and an ever-evolving universe of mobile devices. The risks of metadata will only increase with time.

Workshare recently announced the launch of our version 2.0 Protect Server solution, which can implement best practices across all desktops, mobile devices and web users.Imagine that I am a tablet user who wants to email a Word document. I address the email, attach the document and send it off. I do not have a client application, and I simply did not have time to remove metadata manually. Luckily, I have Workshare Protect Server in my environment, which sends back a clean report specifying what metadata was detected and then removed from the Word document. Protect Server enables mobile users to maintain the natural process of sending documents via email, thereby increasing efficiency and protecting sensitive information.

Protect Server also can convert a Word document to PDF automatically before sending the email. When the document review process is complete, the sender may want to create a PDF for emailing purposes. Protect Server allows the selection of a unique profile for this document type, specifying the kind of metadata cleaning desired and instructing the system to convert to a PDF. The recipient will receive this PDF, not the Word document originally attached. In this way, it is easy to create an email, enter a profile address and even specify conversion to PDF, all within a web application. There is no need to install desktop or client software - it is handled on the server.

Thank you for this opportunity to discuss how to protect desktop and mobile device users from unintended disclosure of metadata in electronic communications. Workshare provides solutions for all types of users, including Workshare's Protect Client application and applications for mobile device users as well as for corporate webmail users.

The following links provide additional reading and further discussion of case studies as well as ideas for how you can start planning for metadata management in your firm, all on Workshare's website:

FRCP and Metadata: Avoiding the Lurking e-Discovery Disaster:

A Guide to Managing Metadata in Today's Law Firms:

Case Study: Workshare Protect Server Provides McMillan Automated Metadata Removal:

Meeting Regulatory Challenges: Metadata in Court Submissions:

In closing, metadata poses a significant risk to organizations. In February 2010, a warning from the U.S. District Court for the Western District of Pennsylvania highlighted the need to educate people on the dangers of metadata. Because metadata can almost always be recovered, the way to minimize risk is to ensure that sensitive information is actually removed from the original document, not just visually hidden or made illegible. Workshare's 18,000 customers provide valuable input when we build and improve our solutions for legal teams. Moving beyond the desktop and providing a server-based metadata removal solution was a direct result of the input from corporate counsel to protect the entire organization against the dangers of metadata.

Please email the presenter at with questions about this webinar.