Having created a very important and valuable document, our first concern is ensuring confidentiality of the information it contains. The first idea that comes to our mind is using some of the extensively advertised "Microsoft solutions". However, the numerous advertising booklets tell almost nothing about the important protection parameter cryptographic stability. In a more understandable language, cryptographic stability means that information stored in a file cannot be recovered in reasonable time. In this article, we'll try to analyze the cryptographic stability of the protection provided in Microsoft Office (a software product used worldwide).
Consider the following key principles of protecting information from unauthorized access. The input data for this task is the document text that we wish to protect, and a secret key (password), which is the "only" way to recover the content of the document in its unencrypted form.
There are few solutions to how this task can be accomplished:
1. Storing the key along with the document; when someone attempts to open the document, the program will check whether the key entered is the same as the stored one. If the key doesn't match, the program locks further processing of the document. The low cryptographic stability of this method is obvious. First, one can retrieve the key by viewing the document in a binary form. Second, an intruder might be able to modify the program code, bypassing the password checking altogether.
2. Storing a key hash in the document. "A hash function is a function, mathematical or otherwise, that takes a variable-length input string (called a pre-image) and converts it to a fixed length (generally smaller) output string (called a hash value)." . When this method is employed, a key entered by a user is being transformed in a data string of fixed length used to verify the key, but it cannot be used to retrieve the original key. This protection method is not secure enough for the following reasons: an intruder that is aware of the hash-function algorithm can generate a hash for a desired key and replace the original hash in the document with his own. Alternatively, like with the first method, an intruder can modify the program code to ignore protection and access the data.
3. Using a key to encrypt the document following certain algorithm. Here, protection stability depends only on stability of the algorithm and on length of the key.
All these three methods are used in Microsoft Office, but no information on their cryptographic stability is provided. Microsoft just provides the following warning: "Caution! Lost password is impossible to retrieve!" and recommends keeping the password list in a safe location. Nevertheless, even if we follow Microsoft's advice, this wouldn't affect protection stability. Microsoft Office documents can be protected using any of several protection levels. Lets examine each one in detail:
A password can be set in a Word document to any of the following functions: opening the file, permitting to save the file, and protecting the document from changes (There is also a password for protection of VBA-macros input texts, which will not be dealt with in this part of the advisory).
We will start by analyzing the "Save" option password protection (which is the first protection method). Let's set this password and then open a document in a HEX-editor.
Obviously, it's enough simply to view a document to retrieve the password! It is stored in the document in Unicode encoding (that's why you notice an "extra" 0x00). I wouldn't advise using this method to protect a document from changes, as an intruder could retrieve a password by viewing a document in a binary mode.
An example of the second method is changing the protection password in the "Tools - Set Protection" menu item. This protection uses a hash of length of 32 bits, so it is possible to retrieve a short password by trying all possible combination. Moreover, irrespective of the password length, it is possible to replace the password (the hash) or simply disabling it.
The password protection method used by the opening file ("Tools - Options - Save - File open password") is the most stable one (secure). This is a typical example of the third protection method. When this password is set, the entire Word document, including a part of auxiliary information, is encrypted with the RC4 algorithm. A 128-bit long hash formed with the MD5 algorithm is used for the password verification. Encrypting key is 40-bit long, as stated in regulations of many countries that don't allow using longer cryptographic algorithms. Note that it is possible to decrypt the document even if the password is unknown. Since it's enough to know the relevant 40-bit key, but the only known method for finding it is brute forcing (checking all known combinations). Trying all the possible keys will take about half a month with a Pentium III 900Mhz. With more powerful multi-processor machines, or using a computer cluster, it is possible to decrypt a document in a reasonable time frame. Some Microsoft Word releases use even less stable encrypting algorithm. For example, in the French release, the password can be retrieved in a few seconds by direct decoding.
This application supports the following password setting: document open, write protection, book and sheet passwords. The former two use the same protection methods as with Microsoft Word. We will consider book and sheet passwords in detail.
When an Excel is being protected with a password, a 16-bit (two byte) long hash is generated. To verify a password, it is compared to the hash. Obviously, if the input data volume is great, numerous passwords will match the same hash. This can be seen empirically: for example, protect a sheet with the password "test", and then try to open it with the password "zzyw".
Book protection is somewhat more sophisticated. Hash generation algorithm is the same as with sheet protection, however, a whole document is being encrypted. This protection seem to be relatively stable at first sight, but a more detailed analysis revealed that it is not the password that is entered (or its hash) which is used to encrypt the document, but rather a fixed key stored in the MS Excel program code.
This key is generated from the password "VelvetSweatshop". What a nice joke by Microsoft! Try to protect a MS Excel document with this password (or to use this password to open a document). The most surprising thing is that no password is required to open a document.
It is possible to use an embedded program language called Visual Basic for Applications (VBA) in Microsoft Office. It is used to create running code within Microsoft Office applications. There are a number of companies that develop such programs for commercial distribution. Microsoft has provided macros developers with the means to protect the macro's original code from unauthorized viewing or modifying.
However, the protection system provided by Microsoft proved to be unstable. In Office 97, passwords are stored almost in their original form - a very simple algorithm is used to encrypt it. In Office 2000, CryptoAPI from latest Windows releases is used. A password hash is generated with the SHA algorithm provided by CryptoAPI and then encrypted with the same algorithm as in Office 97. So, one can either merely view VBA-macros passwords (using an Office 97 program), or replace them with any other (using an Office 2000 generated hash).
This application allows protecting user's personal data stored in *.pst files (Personal Storage Files) using a password. Protection of user's personal information and of his/her personal correspondence is a very important factor to be taken into account when developing general concept of information protection. However, similar to the previous applications, Microsoft uses a very simple and insecure algorithm.
A password hash is generated using the CRC-32 algorithm (32-bit check sum). It has mathematically been proven that a 6-character input data array (non-printable characters are not included) can be found for any check sum. So password retrieval turns to be a trivial task - for example, look at the "Advanced Outlook Password Recovery" software developed by ElcomSoft Co. Ltd. (http://www.elcomsoft.com/aolpr.html), which recovers Outlook passwords instantly. Of course, no encryption of the user's personal information is provided with this password setting in Outlook.
Old Versions of Word and Excel
In Microsoft Word 2.0, 6.0 and 95 (7.0), Excel 4.0, 5.0 and 95 (7.0) Microsoft used an even less powerful encryption algorithm. To encrypt a document, an exclusive OR operation (XOR) with a sequence derived from the password is used. As some (predictable) auxiliary information is encrypted, too, this sequence can be recovered. So file open password in these Word and Excel versions is retrievable in a few milliseconds (look at Advanced Office95 Password Recovery - http://www.elcomsoft.com/ao95pr.html).
Having read this text, many users will become unsure about entrusting their secrets to Microsoft software. The answer is very simple - use other software products to protect confidential information. For example, use the respectable, thoroughly tested software called Pretty Good Privacy (PGP). Its operation is based on a well-known mathematical encryption algorithm - factorization of a very great number into a prime number. There is no analytical solution of this problem, and exhaustion of all possible combinations will take a very long time.
If you do decide to protect your document with a password (to set a file open password in Word or Excel), choose a relatively complicated one. Avoid using words from a dictionary or your name/surname as a password. Your password should consist of letters (both upper- and lower-case), numbers, and special symbols. You can also use symbols from your national alphabet. A secure password might look like this: "fO7#s!kP4x*a". However, please note that with today's powerful computers, decrypting your document won't take longer than a few days.
 Bruce Shneier, Applied Cryptography, Second edition.