Distributing Word Documents with a 'locating beacon'
30 Aug. 2000
The Privacy Foundation has discovered that it is possible to add "Web bugs" to Microsoft Word documents. A "Web bug" could allow an author to track where a document is being read and how often. In addition, the author can watch how a "bugged" document is passed from one person to another or from one organization to another.
Some possible uses of Web bugs in Word documents include:
* Detecting and tracking leaks of confidential documents from a company.
* Tracking possible copyright infringement of newsletters and reports.
* Monitoring the distribution of a press release.
* Tracking the quoting of text when it is copied from one Word document to a new document.
These web bugs are made possible by the ability in Microsoft Word of a document to link to an image file that is located on a remote Web server. Because only the URL of the Web bug is stored in a document and not the actual image, Microsoft Word must fetch the image from a Web server each and every time the document is opened. This image-linking feature then puts a remote server in the position to monitor when and where a document file is being opened. The server knows the IP address and host name of the computer that is opening the document. A host name will typically include a company name if a computer is located at a business. The host name of a home computer usually has the name of a user's Internet Service Provider (ISP).
An additional issue, and one that could magnify the potential surveillance, is that Web bugs in Word documents can also read and write browser cookies belonging to Internet Explorer. Cookies could allow an author to match up the computer viewer of a Word document to their visits to the author's Web site.
Web bugs are used extensively today by Internet advertising companies on Web pages and in HTML-based email messages for tracking. They are typically 1-by-1 pixel in size to make them invisible on the screen to disguise the fact that they are used for tracking.
Although the Privacy Foundation has found no evidence that Web bugs are being used in Word documents today, there is little to prevent their use.
Short of removing the feature that allows linking to Web images in Microsoft Word, there does not appear to be a good preventative solution. However, the Privacy Foundation has recommended to Microsoft that cookies be disabled in Microsoft Word through a software patch.
In addition to Word documents, Web bugs can also be used in Excel 2000 and PowerPoint 2000 documents
Microsoft Word from the beginning has supported the ability to include picture files in Word documents. Originally the picture files would reside on the local hard drive and then be copied into a document as part of Word .DOC file. However, beginning with Word 97, Microsoft provided the ability to copy images from the Internet. All that is required to use this feature is to know the URL (Web address) of the image. Besides copying the Web image into the document, Word also allows the Web image to be linked to the document via its URL. Linking to the image results in smaller Word document files because only a URL needs to be stored in the file instead of the entire image. When a document contains a linked Web image, Word will automatically fetch the image each time the document is opened. This is necessary to display the image on the screen or to print it out as part of the document.
Because a linked Web image must be fetched from a remote Web server, the server is in a position to track when a Word document is opened and possibly by whom. Furthermore, it is possible to include an image in a Word document solely for the purpose of tracking. Such an image is called a Web Bug. Web bugs today are already used extensively by Internet marketing companies on Web pages and embedded in HTML email messages.
When a Web bug is embedded in a Word document, the following information is sent to the remote Web server when the document containing the bug is opened:
* The full URL of the Web bug image
* The IP address and the host name of the computer requesting the Web bug
* A Web browser cookie (optional)
This information is typically saved in an ordinary log file by Web server software.
Because the author of the document has control of the URL of the document, they can put whatever information they choose in this URL. For example, a URL might contain a unique document ID number or the name of the person to whom the document was originally sent.
These tracking abilities might be used in any number of ways. In most cases, the reader of a particular document will not know that the document is bugged, or that the Web bug is surreptitiously sending identifying information back through the Internet.
One example of this tracking ability is to monitor the path of a confidential document, either within or beyond a company's computer network. The confidential document could be "bugged" to "phone home" each time it is opened. If the company's Web server ever received a "server hit" from an IP address for the bug outside the organization, then it could learn immediately about the leak. Because the server log would include the host name of the computer where the document was opened, a company could know that the organization that received the leaked document was a competitor or media outlet.
All original copies of a confidential document could also be numbered so that a company could track the source of a leak. A unique serial number could be encoded in the query string of the Web bug URL. If the document is leaked, the server hit for the Web bug will indicate which copy was leaked.
A serial number could be added to a Web bug in a document either manually - right before a copy of a document is saved - or automatically through a simple utility program. The utility program would scan a document for the Web bug URL and add a serial number in the query string. A Perl script of less than 20 lines of code could easily be written to do this sort of serialization.
Another use of Web bugs in Word documents is to detect copyright infringement. For example, a publishing company could "bug" all outgoing copies of its newsletter. The Web bugs in a newsletter could contain unique customer ID numbers to detect how widely an individual newsletter is copied and distributed.
A third possible use of Web bugs is for market research purposes. For example, a company could place Web bugs in a press release distributed as a Word document. The server log hits for the Web bugs would then tell the company what organizations have actually viewed the press release. The company could also observe how a press release is passed along within an organization, or to other organizations.
In an academic setting, Web bugs might be used to detect plagiarism. A document could be bugged before it is distributed. An invisible Web bug could be placed within each paragraph in the document. If text were to be cut and pasted from the document, it is likely that a Web bug would be picked up also and copied into the new document
To place a Web bug in a Word document is relatively simple. These are the steps in Word 2000:
1. Select the Insert | Picture | From File... menu command
2. Type in the URL of the Web Bug in the "File Name" field of the Insert Picture dialog box.
3. Select the "Link to File" option of the "Insert" button.
Access to the sender's server logs is required to monitor the movement of such Web bugs.
The Privacy Foundation ran simple experiments with Excel and PowerPoint files and found that these files can also be "bugged" in Office 2000. The Privacy Foundation continues to investigate this issue with regard to other software programs.
The document contains a visible Web bug. When the document is opened, the Web bug will show the host name of the computer that fetched the image. In addition, a non-identifying Web browser cookie will be set on your computer. The cookie is non-identifying because everyone gets the same cookie value, which is simple test string.
Demonstrations of "bugged" Excel and PowerPoint files are also available for download from the Privacy Center Web site:
The use of Web bugs in Word does point to a more general problem. Any file format that supports automatic linking to Web pages or images could lead to the same problem. Software engineers should take this privacy issue into consideration when designing new file formats.
This issue is potentially critical for music file formats such as MP3 files where piracy concerns are high. For example, it is easy to imagine an extended MP3 file format that supports embedded HTML for showing song credits, cover artwork, lyrics, and so on. The embedded HTML with embedded Web bugs could also be used to track how many times a song is played and by which computer, identified by its IP address.
Vendor Contact and Response:
Regarding the potential use of Web bugs to track Word documents, Microsoft said that there is no evidence that such activities are occurring.
Short of getting rid of the ability to link to Web images from Word documents, there really is no solution to being able to track Word documents using Web bugs. Because this linking ability is a useful feature, the Privacy Foundation does not recommend its removal.
However, the Foundation does believe that the Web browser cookies should be disabled inside of Word documents. There appears to be very little need for cookies outside of a Web browser. In general, the Foundation believes that cookies should be disabled by default any time Internet Explorer is reused inside of other applications such as Word, Excel, or Outlook. They would like to see Microsoft make this change in the next release of Internet Explorer.
Users concerned about being tracked can use a program such as ZoneAlarm (www.zonelabs.com) to warn about Web bugs in Word documents. ZoneAlarm monitors all software and warns if an unauthorized program is attempting to access the Internet. ZoneAlarm is designed to catch Trojan Horses and Spyware. However, because Word typically does not access the Internet, ZoneAlarms can also be used to catch "bugged" Word documents.