Alastair’s Place

Software development, Cocoa, Objective-C, life. Stuff like that.

Why Not Use a Spreadsheet?

This BBC news article reminded me that I wanted to write a short piece about spreadsheets, and in particular about an entirely non-obvious danger that spreadsheets pose to their users.

What do I mean? Well, in particular, calculators and spreadsheets may give different answers for the same calculations! That fact is surprising to many, even to people who should know that that is the case.

Why does this happen, and why should I care? OK, so the first thing you need to know is that the calculator on your desk probably represents numbers the way you think about them — i.e. in decimal. So, on your calculator, when you see 2.1 displayed on the screen, the number the calculator holds in its memory really is 2.1.

Your computer, on the other hand, prefers to use binary rather than decimal to store numbers, the reason being that manipulating binary numbers is hugely faster for a computer. Now, in binary, 2.1 is 10.0001100110011… a recurring fraction. As a result, when your spreadsheet shows you the number 2.1, it is lying. The number it has in its memory is not 2.1; it is very close to 2.1, but it is actually 2.0999999…

I don’t care, you say. Well, maybe you do, maybe you don’t. For instance, if you take your calculator and enter 0.1+0.1+0.1-0.3, you get the expected answer 0. If you do the same in a spreadsheet, it may show you 0, but it will actually have calculated something like 5.55×10-17. Similarly, on your calculator, 0.1×0.1-0.01 is 0, whereas on your computer, it is very probably around 1.73×10-18.

Worse, the chances are that the people who wrote your spreadsheet software knew that this problem existed, and so they try to hide it from you. Well-written binary floating point libraries will always attempt to find the shortest decimal that matches the binary representation they have, so you will often find that the answer looks the same on the screen.

At this point, unless you’re a pedant, you probably still believe that you don’t care — after all, the inaccuracy is very small. But let me convince you otherwise; imagine you are an examiner, marking an exam script. Further imagine that students have been told to present their answers correctly rounded to two decimal places. Set the cells in your spreadsheet to round to two decimal places (this is usually an option under the Format menu) and enter the following into a cell:

=3.013 * 5

You should see the correctly rounded answer, 15.07 (the actual answer is 15.065). Now let’s imagine we are also told to subtract 15 from it; enter

=3.013 * 5 - 15

The chances are quite good that your spreadsheet is now showing 0.06, and not 0.07. Entering the same thing on your calculator should verify that this is wrong(you’ll get 15.065, minus 15 = 0.065, rounded to 2 d.p. is 0.07).

If the exam board had provided a spreadsheet to its examiners to help with the marking, and it causes this kind of error, students are going to lose marks for writing the correct answer. That might make the difference between someone going to university and not; you might have messed up their entire life simply because you didn’t understand that spreadsheets do arithmetic in binary and not decimal.

How can we fix this problem? Well, computers can accurately represent integers, so you could just multiply everything by 1,000 and then divide at the end; i.e. enter

=(3013 * 5 - 15000) / 1000

which will correctly round to 0.07. Yes, that’s right, you get different answers in your spreadsheet from

=(3.013 * 5 - 15)

and

=(3013 * 5 - 15000) / 1000

and not only that, but the latter is more accurate in spite of having an extra calculation in it (welcome to floating point, by the way).

The best solution, of course, is to use something that does decimal arithmetic when you actually care about having an accurate decimal result.