Hacking Excel — Part I

Henrik Massow
9 min readDec 4, 2019

Many of us use Excel more or less frequently. We all know the famous green file icon and the typical file extension *.xlsx.

However, only a few of us have spent some thoughts about this file format, right? And since Excel is just a tool for most of us, it’s perfectly fine not to think too much about the details of Excel files. Why should we? After all, we can also be good car drivers without knowing any details about the brake system or the engine.

But what if I tell you that knowing Excel files can someday save your butt? For example, because we’ve forgotten passwords, need to remove password secured worksheet protection, or simply because we have to be the bad guy for some reasons.

Words of caution: I do not want to get into the usual hypocrisy, when it comes to bypass passwords and encryption. Let’s be honest it’s almost never just about our own forgotten passwords. It is always about foreign files and systems. Nevertheless, I ask everyone to behave ethically, to respect the will and work of others, and to obey the law. Just as you and I do not want our work to be stolen, destroyed or devalued, others do not want it either. Remember: “With great power comes great responsibility.”

Basics of XLSX files

An xlsx file is (technically unclean) nothing more than a zip file. Really! Let’s try it! Take the following file.

Nothing really special here. We have two worksheets in our file. In the first one (“multiply_by_two”) there is a column (A) with numbers and in the column next to it (B), we inserted a formula which multiplies each value from column A by 2.

The second worksheet (“hello_world”) contains only two short sentences, one in bold style.

Now, save this file wherever you want.

I chose the desktop because it’s always a good place to store files.😆 A real IT freak can always be recognized by the fact that his desktop is littered with icons and files, like the breast of a Russian general with medals.😆

Ok, now let’s just rename the file, but only change the file extension from xlsx to zip. Ignore any warnings from your operating system that the file could be corrupted. This will definitely not be the case here.

The result should look like this.

By double clicking we can open the file in a zip program of our choice. Of course, it is not mandatory to rename the file as we did, but for the first time, it makes things easier. Usually you can just open an xlsx file in your zip program of choice but sometimes these programms then try to zip your xlsx-file.

As mentioned above, there are actually a lot of subfolders and files in the zip file. These are usually xml files, but other files can also be content of an xlsx-file.

Let’s open the [Content_Types].xml file by double-clicking. Depending on the operating system settings, a program that can display XML files should launch and will present us the file’s content (see below).

We do not have to dive deep inside it here. Ultimately, this file is a kind of table of contents. For example, we can see that the workbook contains two worksheets (marked in yellow).

Of much greater interest, however, is the folder worksheets in the folder xl.

There we can find a separate xml file for each worksheet. Sheet1.xml looks like this.

The whole logic of how Excel stores everything in XML markup is quite complex and I will not deeply discuss it in this Article. But what you can see very quickly is that Excel stores a lot of information about the worksheet in this xml file. For example, at the very top you can see in which area there is data at all in our sheet ( <dimension .../>).

In addition, we see that cell A1exists in row 1 and has the value 1 (<v>1</v>). For the adjacent cell B1, Excel saved both the formula (<f>+A1*2</f>) and the last calculated result of the formula (2).

By the way, if you look at row 3 and 4, you can see that the formulas are not explicitly stored here. Excel does not restore the formula every time, but shares it by referencing (to row 2 in this case).

It gets even stranger if we open the xml file from our second worksheet. Remember, we wrote only two short sentences there.

But what is that? Where are our sentences? We saved everything, definitely! And why is there “0” and “1”, we definitely did not write “0” and “1” in these cells!

And while we’re at it, how does Excel actually know what our worksheets are called, this is not in the xml file of the current worksheet?

❓ ❓ ❓

The first question can be answered quickly. In folder xl there is a file called sharedStrings.xml. In this file, strings are shared throughout the document.

So “0” and “1” were indices or references to this xml file and Excel will later look up there to find the strings to be displayed.

And also the second question is easy to answer, because of course there is also an xml file for this information.

What to do with this knowledge

(For) knowledge (itself) is power! -Francis Bacon

I think what you should take home as knowledge so far is that Excel’s XLSX files are not just monolithic blocks of strange characters, but rather mantle a bunch of XML files and other resources. These (inner) files describe our Excel file down to the smallest detail. Many information about Excel files can be found here, even if someone tries to hide things in Excel (I’ll come back to this point soon.).

It is not necessary to touch all the details of this standard. There are very detailed descriptions in the web. Furthermore, all the programming libraries that read, modify, and write Excel files are based on being able to read, modify, and write XML files. You can also learn a lot from descriptions of these libraries, even if you are not a programmer.

Most importantly, you have now found a way to investigate and manipulate Excel files with only a text editor. From now on, it’s up to you to think about the possibilities this offers.

So, let’s start thinking about some common scenarios…

Photo by Joanes Andueza on Unsplash

Worksheet protection

Let’s take a look at the following file. When we deploy it, any user can simply change anything: delete or paste formulas, formatting, values, columns …

Sometimes there may be scenarios in which we do not want users to be able to make such changes, either intentionally or accidentally. That’s most of the time a valid concern. At the same time, there are often people who think other users should not be able to see their ingenious formulas. This is bullshit…

In these cases Excel offers the possibility to activate a worksheet protection with password. In our case, an input should only be possible in the yellow field. How you can do this is explained here by Microsoft, I will not explain that here.

The result is this.

Without the password you can’t see the formula, you can’t change any of the cells, except the yellow marked. Note: Although the formula is hidden, Excel still calculates correctly.

What if the formula interests us? What if we want to change something on this worksheet but we do not have the password. What if we want to check the algorithm? Well, let’s look at the xml file, but this time open it in an editor, we’ll have to make some changes…

Only two things are important in the xml file.

First, the “super secret” formula is visible and changeable here. 😈

Second, here we can find all the information about current worksheet protection (algorithm, salt and password hash). So we can simply remove the <sheetProtection .../>-tag, save the file, save all changes to our zip file and rename it back to xlsx.

Back in Excel, the worksheet protection should be gone.

BTW, my attempts to replace the password in xml by another password hash and setting the salt to “” were not yet successful…

Workbook protection

Imagine the following Excel file.

The user can enter a car-make an in D2 the corresponding rate will be shown. To find the rate, we just use VLOOKUP (SVERWEIS in the German Version) to find a match in sheet “cars”.

Stop! Where ist sheet cars? Where is our data, where is the lookup table?

As you can see here, this worksheet is hidden.

If I unhide it we can have a look at the lookup table. But the usual case here would be, as described above, to protect the input worksheet as much as possible from changes, hide the cars worksheet, and prevent the users from viewing the cars sheet.

The latter is achieved by applying not only worksheet protection but also workbook protection.

Since I only have a German version of Excel, you may need to check the online documentation to enable workbook protection in your language version.

In any case, the result is that users without password have no chance to view the lookup table in Excel.

As you can see, hide and unhide (Ausblenden, Einblenden) is greyed out.

So let’s hack this file! Open it in a zip program as described above and first look it up in the docProps/app.xml file.

In this file we can see that there are at least two worksheets and what they are called. We can also see in xl\worksheets\sheet2.xml which values are in our lookup table.

Unfortunately, the names of the car-makes can only be found by reference to xl\sharedStrings.xml, but at least, we know that they are there!

If you have a little bit of programming experience, then you could read this file very easily with this knowledge. Without a password!

But of course we still want to remove the password, so that we can also work comfortably in Excel. Therefore, we now open xl\workbook.xml in an editor and search the marked passage.

Back in Excel, we can easily display the hidden table and finally view, copy, modify the data …

End of Part I

At this point, you may want to briefly review the points discussed. what you have been learning so far is

  • how Excel files are structured
  • that there is no protection of the data with worksheet and workbook protection
  • how you can help yourself if you forget a password. 😈

More coming soon in Part II.

--

--