Yet More Anti-filesystem Rhetoric

May 15th, 2013

Marco Tabini’s article on the MacWorld site is the latest to point out the “steadily escalating war against the filesystem”, in this instance waged by Apple but Microsoft and others have been conducting operations in this arena also.

Marco unfortunately cites packages as an example of an anti-filesystem “thing” invented by Apple, which is wrong on two fronts: first, they really don’t imply anything about filesystems or otherwise, and second it was arguably NeXT that came up with the idea in this particular context, though even they couldn’t have claimed it as a unique innovation (the Acorn Archimedes, which pre-dates the NeXT machine by a year, also used folder-based applications).

History lessons aside, though, the anti-filesystem rhetoric is a mistake, founded on a whole pyramid of mistakes. The first and most important of these mistakes is the assumption that the end user is a total, gibbering idiot. Indeed, it is easy for any software developer to see how such an assumption might come about, given some of the queries we have to deal with day-in day-out, but fundamentally the idea of a hierarchical filing system is no more complicated than that of a filing cabinet. Would you be offended if I decided you were such an imbecile that it would be unthinkable that you could fathom the complexities of a filing cabinet? Yes, and rightly so.

I want to divide this post into two parts. In the second, we’ll examine why it is that users find the notion of the filesystem so confounding. But first, let’s consider the alternative model that is usually proposed.

The “Document-centric” Model

Those opposed to the notion of the filesystem like to talk about a “document-centric” interface. So, if I create a wordprocessor document in Pages on my iPad, or even in iCloud using Pages on my Mac, that document “lives” somehow “in” Pages.

At first glance, this seems a great idea. If you ask users (who are most certainly confused) where their documents are, they will often come up with explanations like “I saved it in Word”. And as any computer-literate person knows, woe betide you if you use Word on someone else’s computer to open a file that isn’t in whatever default location the Open dialog shows. There is a high probability that the result will be that said someone else will excoriate you for “fiddling” and “losing all their documents”.

So where’s the problem? Clearly it fits with user expectations as to how things behave, and that’s normally a good thing, right?

The problem is actually exposed rather neatly by the UI for selecting documents that has been adopted in some iOS software that uses this pattern. Earlier versions of the iWork applications, and also Omni Group’s software, used a file chooser that looks a bit like this:

'A simple file chooser'

If you have more files, you can swipe left and right to see them.

Unfortunately, this interface is only any use if you have only a tiny number of documents, and so a more recently both Apple and Omni have changed to a grid-based chooser, like this:

'A grid-based file chooser'

Looks great, right? Until you realise that now all my documents about animal husbandry are going to be mixed up with the letters I’ve sent to my bank, the copies of that report I wrote for work, etc.

What’s the logical method of fixing that, you ask? Well, Apple has already showed the way by adding groups to the iOS SpringBoard app (that’s the application chooser, for those who don’t know). Let’s take a look at that too:

'A group-capable file chooser'

Wow! What a great interface, right? Well, sure, it’s OK, though what you’ve really done here is invented a rubbish new version of a hierarchical filesystem.

Yours has exactly one level of hierarchy (so I can’t further group my animal documents by type), and probably a small limit on the number of items per group too. It is also a little sparse on the metadata front — we know the name of each document, its size and the date it was created, but a typical modern filesystem can manage quite a bit more data than that.

Oh, and to cap it all, every single application has to implement this functionality itself, and may implement it subtly differently. For instance, maybe it isn’t permissible to have two files with the same name? Perhaps there are restrictions on the lengths of filenames? Or on the sizes of files?

Additionally, because every application only has access to its own files (and exactly how they’re stored is in any case up to the application), it’s really hard for any other application to access your “My Horse” document, even if that’s what you as an end user want. You can, of course, use hacks like huge long URLs to pass data between applications, but that risks losing valuable metadata and may also create security holes in the user’s web browser in the process.

Summary: In order to fix the problems with your “document-centric” vision, you’ve been forced to reinvent the hierarchical filesystem. But your version is a bit rubbish compared to even the worst present-day filesystem.

Instead of re-inventing the filesystem in the name of getting rid of the filesystem, could we, perhaps, just use the filesystem?

So what’s wrong with the filesystem

It’s easy to see that when people talk about wanting to “get rid of the filesystem”, what they really want to do is to remove the confusion that users seem to experience when presented with simple filesystem tasks on modern computers. Unfortunately, rather than examining the cause of this problem, too many designers and developers have jumped immediately for what they see as the solution.

So what is the cause? Well, back in 1989, I got my first 32-bit micro, an Atari ST (yes I said 32-bit, and yes, I mean 32-bit; only the PC was 16-bit… the Atari, Commodore and Apple machines of the era were all 32-bit from the outset). It had, like the Apple machines but not entirely like Commodore’s line, a ROM-based operating system, and so when you looked at your disks, the chances were fairly good that they were empty. That is, the entire area was yours to use as you pleased.

If you bought an application for your machine, it would come on disks, but most likely you’d have a favourite disk or disks and you’d just copy the application and any files it needed from its distribution disk to an appropriate place on your disk(s). Yes, there were some things that were fixed (e.g. the Atari range would run, on boot, anything in a folder called “Auto” in the root directory of the disk in their A: drive, they might also load a file called “desktop.inf” containing the GEM Desktop’s preferences, and so on). But the number of those things was small, and for the most part the disk was yours.

When I first got a hard disk, a huge monster with only 20MB of storage in total, the situation was very much the same. My C: drive was mine. Yes, by that time I had a multitude of programs in my “Auto” folder, as well as some “desk accessories”, and I might have had a few more config files, some fonts and so on lying around, but overwhelmingly the layout of my files and folders was my own. I knew where to find my documents on the finer points of train-spotting because I put them there.

The “filesystem is hard” problem started, I think, on the PC. I think it started under DOS, where some business applications shipped with relatively large numbers of files that needed to be copied into a directory in order to run (in contrast, even relatively large pieces of software on the Atari platform tended to consist of a couple of files). There were good reasons for this; DOS programmers weren’t being idiotic — they just had to deal with limited address space (640KB) and if they wanted to e.g. print something, well then they’d need drivers for every available printer, because DOS didn’t know how to do that. (Contrast: the Atari platform was a GUI with a virtualised graphics device interface, and so drawing to the printer was basically the same as drawing to the screen, though you needed to use a different graphics device.)

So, when you bought Wordperfect or Microsoft Word or similar for DOS, the chances were good that you’d have a few disks’ worth of files to install. You could have copied them yourself, but that’s a bit annoying so to help you out, they’d ship with a program that would install the software for you.

With the advent of Windows, matters became worse. Windows was a large piece of software, and it had a new feature — dynamic linking — that its designers had enthusiastically adopted, breaking the API up into chunks and placing them in separate library files. Plus it was graphical, and so it needed fonts (bitmap fonts at first, TrueType later), its own printer and graphics drivers, its own networking drivers and so on and so on. Lots of files — in fact, so many that you might not want all of them installed in the precious space on your expensive hard disk. Ergo, an installer was required.

The upshot of these installers is that now you have large areas of the disk that you, the user, did not organise. Some of these areas may be fragile; if I rename “C:\WINDOWS” to “C:\MSWIN”, will it work? And what’s this “USER.DLL” file anyway? It looks big — do I need it? Can I delete “HPLJ4.SYS”? And so on.

Windows also made the filesystem less accessible by creating “Program Manager”. This was a way for Windows applications to show a single icon by which they could be started, without the end user having to know necessarily where on the hard disk the program file itself was installed. Arguably it was made necessary by the messy layout of the “C:\WINDOWS” folder, which contained a fair number of applications that shipped with Windows itself, but which was very definitely a fragile area where user tampering could cause trouble.

Additionally, the Windows “File Manager” was nothing like the interfaces provided on the Atari ST, Commodore Amiga and Apple Macintosh platforms. It had a two-column user interface reminiscent of its “MS-DOS Shell” predecessor; this interface is far from intuitive, and to many users File Manager would have been a total mystery. (Contrast: the Atari ST came with a disk that had a training program on it that taught the user to use a mouse, to create and navigate through folders and to copy, move and delete files.)

Given the ever larger number of files shipping with major software packages for Windows and the fact that File Manager and COMMAND.COM are fairly poor user interfaces for an unfamiliar user, the installer was here to stay. How many files did Word 2 install on your machine? Do you know? Do you know where? Most users didn’t, and most users didn’t care.

Some other unfortunate design choices at this point made matters much worse than they had to be. The Windows “Open” and “Save As” dialogs were similar in design to those on other systems, but because of the lack of a proper equivalent to the Macintosh Finder, the GEM Desktop or the Amiga Workbench and because of the reliance on installers, fewer and fewer users had ever really seen the filesystem. Mostly they’d typed in a few arcane commands, or even booted their machine from a disk that installed Windows automatically, and then inserted some disks that installed Microsoft Office, and that was that. As a result, when presented with these dialogs, users often didn’t know what they were looking at. Worse, they would often default to idiotic locations, like the “C:\WINDOWS” directory, or the install directory for the application in question, with the result that, since the only thing in the box that the user understood was the file name field, many users would save all their documents in the Windows folder. Or the “C:\WORD” folder. And so on.

Now, Windows 95 made some substantial improvements, adding Windows Explorer (and no, I do not mean the File Manager interface, I mean the entire desktop environment), and removing Program Manager (or, perhaps more accurately, replacing it with the Start menu). Unfortunately, a lot of users came from Windows 3, and so were already used to not knowing about the filesystem; a lot of developers carried on shipping software with large numbers of files, using installers; and Microsoft contrived to make the confusion worse by adding the “Program Files” folder and in OSR2, the “My Documents” folder, contributing further to the impression that the user’s disk should be organised more for the convenience of software developers than for their own purposes.

Rather than completely blaming Microsoft, let us at this point look at Mac OS X, which didn’t inherit filesystem problems from Microsoft, but instead has borrowed them from UNIX.

The original Mac OS was very much like the Atari ST and Commodore Amiga systems, in that the user was very aware of the organisation of data on his or her disks. As with all systems, over time, more clutter turned up on the disk, particularly the hard disk from which the system was booted, but fundamentally the Finder, like the GEM Desktop and Amiga Workbench, was designed to quickly, simply show the user what was on the disk. If you wanted to run an application, you navigated to it on your disk and double-clicked it; there was no false hierarchy like that of Program Manager or the Start Menu.

While Mac OS X inherited much from Mac OS 9 and earlier, a lot of its underpinnings came instead from NeXT, whose operating system was based on BSD UNIX. Now, a UNIX system is inherently multi-user in nature; this is quite a departure, actually, from previous consumer desktop operating systems, and it has some implications. For one thing, UNIX has a notion that there might be a systems administrator of some sort, and that it is a requirement that users can’t tamper with the system or even with each others’ files. For another, UNIX has a long tradition of hard-coded paths (e.g. you can rely on a Bourne shell existing at “/bin/sh”), and coupled with the UNIX idea of a single unified filesystem namespace, this implies again that the user cannot be in control of the disk. Well, the root disk, at any rate.

The mistake Mac OS X makes here is the same one that the various attempts at Linux on the desktop make — they expose the root of the filesystem namespace to the user, and then in the case of Mac OS X go to great lengths to hide all kinds of “special” (and fragile) folders from end users who can’t be relied upon to understand their contents or the fact that they shouldn’t tamper with them. More recently, Mac OS X removed disk icons from the desktop, leaving it empty by default — there isn’t even an icon for the user’s home folder. Small wonder new users don’t understand the filesystem if you don’t show it to them!

Finally, Mac OS X and Windows, as well as numerous third-party software packages have made matters worse by placing all kinds of extra files and folders in users’ home folders. I understand the argument for them being there, but every extra file or folder of this type is contributing to the confusion users experience when (if) they are shown their disk. Hiding it, as Mac OS X does with “~/Library” is a half-assed solution; what if I wanted a folder called “Library”? I can’t have it, that’s what. Hiding things also creates problems if users want to back up their files; e.g. should I back up “~/Library/Preferences”? Probably, but I most likely do not want to back up “~/Library/Caches”.

But the filesystem is hard

No, no it isn’t. It’s like a lady’s handbag or a gentleman’s tool box. You can imagine putting ever smaller bags within a handbag, or even smaller boxes in a tool box, and so can your users. You’d have to be really quite sub-normal to have difficulty with this notion, actually.

The “hard” part is knowing that it actually exists, or that it’s a bit like a bag full of bags in the first place, and that’s our fault as developers and designers.

So what should we do?

Well, for one thing, stop reinventing the filesystem in the name of ridding us of the filesystem. For another, Apple needs to ship a filesystem chooser in iOS, and, sandboxing or no, the user needs to be able to pick any file they like from any application that knows how to open it. That shouldn’t mean applications automatically get access to any old file — I’m quite happy for the user to pick it.

Second, treat the user with some respect. Stop putting system files and application files in users’ home areas without asking. There’s an argument for storing preference files there, and maybe even for allowing users to install plug-ins and drivers and things, but in that case you need to add exactly one folder (which should probably be called “System”, not “Library”, as the chances of a user wanting a folder called “System” are quite small) and everything should go inside it. You might even care to stick a “Read Me” file in it to explain to users what it is. You might convince me of the need for a “Temp” folder or similar as well. But that’s that, and neither of these should end up deeply nested or with lots of data in them.

Third, show the user their home folder. Put it on the desktop. And show them any disks or storage devices they attach too. Don’t hide them away, and don’t go creating mysterious files and folders on them without being told to.

Finally, stop spouting rubbish about the filesystem. It isn’t hard, it isn’t complicated, and users can understand and use it.

Why Not Use a Spreadsheet?

Apr 22nd, 2013

This BBC news article reminded me that I wanted to write a short piece about spreadsheets, and in particular about an entirely non-obvious danger that spreadsheets pose to their users.

What do I mean? Well, in particular, calculators and spreadsheets may give different answers for the same calculations! That fact is surprising to many, even to people who should know that that is the case.

Why does this happen, and why should I care? OK, so the first thing you need to know is that the calculator on your desk probably represents numbers the way you think about them — i.e. in decimal. So, on your calculator, when you see 2.1 displayed on the screen, the number the calculator holds in its memory really is 2.1.

Your computer, on the other hand, prefers to use binary rather than decimal to store numbers, the reason being that manipulating binary numbers is hugely faster for a computer. Now, in binary, 2.1 is 10.0001100110011… a recurring fraction. As a result, when your spreadsheet shows you the number 2.1, it is lying. The number it has in its memory is not 2.1; it is very close to 2.1, but it is actually 2.0999999…

I don’t care, you say. Well, maybe you do, maybe you don’t. For instance, if you take your calculator and enter 0.1+0.1+0.1-0.3, you get the expected answer 0. If you do the same in a spreadsheet, it may show you 0, but it will actually have calculated something like 5.55×10^-17. Similarly, on your calculator, 0.1×0.1-0.01 is 0, whereas on your computer, it is very probably around 1.73×10^-18.

Worse, the chances are that the people who wrote your spreadsheet software knew that this problem existed, and so they try to hide it from you. Well-written binary floating point libraries will always attempt to find the shortest decimal that matches the binary representation they have, so you will often find that the answer looks the same on the screen.

At this point, unless you’re a pedant, you probably still believe that you don’t care — after all, the inaccuracy is very small. But let me convince you otherwise; imagine you are an examiner, marking an exam script. Further imagine that students have been told to present their answers correctly rounded to two decimal places. Set the cells in your spreadsheet to round to two decimal places (this is usually an option under the Format menu) and enter the following into a cell:

=3.013 * 5

You should see the correctly rounded answer, 15.07 (the actual answer is 15.065). Now let’s imagine we are also told to subtract 15 from it; enter

=3.013 * 5 - 15

The chances are quite good that your spreadsheet is now showing 0.06, and not 0.07. Entering the same thing on your calculator should verify that this is wrong(you’ll get 15.065, minus 15 = 0.065, rounded to 2 d.p. is 0.07).

If the exam board had provided a spreadsheet to its examiners to help with the marking, and it causes this kind of error, students are going to lose marks for writing the correct answer. That might make the difference between someone going to university and not; you might have messed up their entire life simply because you didn’t understand that spreadsheets do arithmetic in binary and not decimal.

How can we fix this problem? Well, computers can accurately represent integers, so you could just multiply everything by 1,000 and then divide at the end; i.e. enter

=(3013 * 5 - 15000) / 1000

which will correctly round to 0.07. Yes, that’s right, you get different answers in your spreadsheet from

=(3.013 * 5 - 15)

and

=(3013 * 5 - 15000) / 1000

and not only that, but the latter is more accurate in spite of having an extra calculation in it (welcome to floating point, by the way).

The best solution, of course, is to use something that does decimal arithmetic when you actually care about having an accurate decimal result.

Open Source Entitlement

Apr 17th, 2013

Some days I wonder why I bother. I’m sure others who have open sourced their code have had (and continue to have) the same experience. In fact, I’ve read about it, so I know it affects all of us, but here’s a summary of events that have consumed a substantial amount of my time today:

At around 7:35pm last night, Andreas Jung of zopyx.com sent me an e-mail to ask what had happened to pwtools 0.3 as it seemed to be MIA.
I took a look at PyPI and found that, indeed, the record for version 0.3 had inexplicably vanished. No problem — I added it back, and also posted version 0.4 (with an updated word list).
This afternoon, Andreas sent me the following terse e-mail:

Why can’t you just upload a release file to PyPI? If you host is down then everybodys buildouts are broken.

Now, leaving aside the fact that I’d fixed his problem already, and that it had nothing to do with my website whatsoever, the fact is that this e-mail is rather rude. Andreas is using a piece of software I’ve released for free in source code form, and is now demanding that I do something for his personal convenience. Since I was busy, I pointed out that the problem wasn’t that I hadn’t uploaded a file to PyPI, and that he couldn’t rely on PyPI in any event if he wanted his buildout to work no matter what. (PyPI has probably had more downtime than my site in recent times anyway, even if there are mirrors of it available these days.)

A couple of e-mails later, after I complained about his rude e-mails and sense-of-entitlement, Andreas informed me that

This is the typical egotistic Python package maintainer mentality. …I call this clearly asshole attitude…Once again: this is egocentric asshole mentality.

Hardly surprising that he finds it typical, I think. Andreas then proceeded to fork the package (fair enough, it’s MIT Licensed), but has left the author field on his fork set to my name, the website field set to my website, and changed the package description to

pwtools provides a robust password generator and a password security checker based on the design of libpasswdqc. pwtools does not use code from libpasswdqc, but is implemented in pure Python. This is a fork since the primary maintainer refuses to upload release files on PyPI.

This is completely untrue. In actual fact, the only reason I hadn’t uploaded the source distribution is that when I started out with PyPI, it was called the cheese shop and didn’t support that. Sometimes I forget that these days I should do python setup.py sdist upload rather than just registering the new version. Not a big deal, but equally not something I’m going to rush to fix just to satisfy Andreas Jung (who, I expect, is making money from whatever he’s using Python for).

As a result, I’m now faced with having to waste the PyPI maintainers’ time asking them to fix his forked package’s record.

Why Core Data Is a Bad Idea

Apr 17th, 2013

OK, so I’m being a little mischievous here; it’s possible that, for your application, Core Data is a good fit. But carry on reading, because I want to highlight something that you may still wish to think about when deciding to use Core Data as a persistence mechanism in your app.

One of the first things you might want to consider when thinking about using Core Data is what type of persistent backing store you wish to use. Apple’s framework provides four built-in implementations (three on iOS), namely:

XML (OS X only)
Atomic
SQLite
In-memory (and so, not, in fact, persistent)

Of these, the XML store is clearly intended for debugging, so let’s discard that, and the in-memory store, while useful, is not really persistent, so we’ll ignore that for now too.

So the options are the Atomic store and the SQLite store. Both are documented as “fast”, but the Atomic store only supports reading or writing the entire object graph, which seems like quite a bit of overhead.

A lot of people therefore plump for the SQLite store.

Now, SQLite was designed to provide DBMS-style ACID consistency guarantees, and in applications where that matters, it’s important that the database does not become corrupted. As a result, it uses either a rollback journal or write-ahead logging, and it also at various points needs to guarantee that changes have genuinely been flushed to non-volatile storage, which it does using the fsync() or fcntl(fd, F_FULLFSYNC) calls.

If you need this level of consistency guarantee, this is a Good Thing, but it is not without overhead. As the SQLite manual notes, “some operations are as much as 50 or more times faster” when SQLite does not need to call fsync()!

Sadly, a lot of applications that use Core Data have chosen the SQLite store but do not need this kind of consistency guarantee, and the highly synchronous behaviour simply creates a performance problem. The situation is much worse on a networked set-up where users’ home folders are on a server, because the fsync() causes the server to flush data to disk… imagine how much additional (and unnecessary) disk traffic that creates if you have 20 users all logged-in using your application.

I don’t want to name and shame here, but if your application’s Core Data store is really just an index to some other data (e.g. for an e-mail client), or if it is something you could easily reconstruct (e.g. in an RSS reader), you really don’t want to burden your users with the overhead of unnecessary synchronous I/O.

Now, on OS X 10.4, the only thing you could really do about this problem was to use a different persistent store — there is a defaults setting the user can set to disable synchronous behaviour, but it’s system-wide rather than per application, so telling people to set it is just a bad idea. For many applications, the Atomic store would be just fine, but for some applications you would have had to write your own store type.

Thankfully, on OS X 10.5, Apple added the NSSQLitePragmasOption store option, which allows you to tell SQLite exactly what kind of behaviour you expect from it. For applications like e-mail clients and RSS readers, you probably just want to turn synchronous behaviour off completely, e.g.

NSDictionary *pragmaOptions = @{ @"synchronous": @"OFF" };
NSDictionary *storeOptions = @{ NSSQLitePragmasOption: pragmaOptions };
NSPersistentStore *store;
NSError *error = nil;

store = [myCoordinator addPersistentStoreWithType:NSSQLiteStoreType
                                    configuration:nil
                                              URL:myStoreURL
                                          options:storeOptions
                                            error:&error];

The worst that will happen is that your database will be corrupted, but you can easily rebuild it from other data. Your users will thank you, because your application will run significantly faster and even if they have to rebuild, it will be fairly quick.

According to the SQLite docs, the default setting for PRAGMA synchronous is FULL, which is probably not necessary in 90% of cases — even if you really don’t want your data file to get corrupted, NORMAL may well give sufficient guarantees for your application. Also, Apple’s documentation indicates that the related PRAGMA fullfsync option is disabled by default as of OS X 10.5, so we needn’t worry too much about that.

An additional option you may wish to contemplate is the PRAGMA journal_mode setting. In particular:

The SQLite default setting, DELETE, is slower than the TRUNCATE setting, and so you might care to specify TRUNCATE on systems prior to 10.7.
As of OS X 10.7/iOS 5, you could set it to WAL to enable write-ahead logging rather than using a rollback journal, which will improve performance, especially if you set PRAGMA synchronous to NORMAL rather than FULL.

You can see the documentation for PRAGMA journal_mode on the SQLite website.

To summarise:

If you’re using Core Data, your application may not need the SQLite data store. You might actually be better off with the Atomic store.
If you are using the SQLite data store, there is a very good chance that the default behaviour is overkill for your application. In many cases, you could reasonably disable synchronous disk behaviour by setting PRAGMA synchronous to OFF, and in the vast majority you could make your application run faster by setting it to NORMAL without a substantial increase in the risk of data loss.
You may also wish to consider altering the PRAGMA journal_mode setting.

Computers Don’t Natively Handle Negative Numbers?!

Jan 10th, 2013

Having just read Graham Lee’s latest post, “What happens when you add one to an integer?”, I had a couple of comments.

First, Graham asserts that “computers don’t natively handle negative numbers”. This is sort-of true, but only sort-of. It is true from the perspective that there is no way to store a ‘+’ or ‘-’ sign in a flip-flop; it would have to be encoded somehow. That’s a pretty silly argument, though, because technically you aren’t storing ‘0’s and ‘1’s in your flip-flops either; it’s really an electrical charge that you’re dealing with. In any case:

Most computers use 2’s-complement arithmetic, under which addition and subtraction are identical whether you’re dealing with signed or unsigned numbers.
Most computers that use 2’s-complement arithmetic have separate instructions for signed multiplication and division. Some also have “arithmetic shift right”, which shifts all the bits one to the right, leaving the top bit alone (as opposed to “logical shift right”, which tends to zero it).
Most modern computers support IEEE floating point, which explicitly does support negative numbers (it uses a sign/magnitude representation).

Some older machines use 1’s-complement arithmetic or sign/magnitude even for integers. That’s pretty unusual, though, and you’re unlikely to run into such a device.

One other observation I’d make is that in some problem domains, another kind of arithmetic is interesting, namely saturating arithmetic. With saturating arithmetic, INT_MAX + 1 = INT_MAX. Why is this used and what supports it? It’s good for audiovisual processing; you don’t want sample values overflowing and wrapping. Saturating arithmetic is usually available on DSPs and also in the vector units of modern microprocessors.

If we’re talking about the C programming language, C99 §6.5 ¶5 says:

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behaviour is undefined.

while §6.2.5 ¶9 states that:

… A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

The combined effect of which is indeed to render the result of adding one to the maximum value of a signed integer undefined as far as C is concerned. If you’re using some other programming language, on the other hand, it may be very well-defined what happens in such a case (maybe you get an exception, perhaps the type of the result will be different, or maybe it’s defined to wrap or even saturate).

In fact, a number of microprocessor architectures maintain an overflow flag (typically labelled the “V” flag) to allow assembly programmers to detect just this case and handle it as they wish. C compilers sometimes have a flag that causes them to emit code to test the V flag and abort execution if overflow is detected.

As regards choosing types for variables in your program, I disagree with Graham’s implication that it’s best to use signed types even for notionally unsigned quantities. Neither approach protects you from unexpected overflow, and using a signed type for a signed quantity just means you’ve added an extra failure mode (before, when it was unsigned, it could be too large, but at least it was always positive; as a signed value, it could still be too large, but it might also now be—unexpectedly—negative).

There is a loop-related argument about signed versus unsigned variables that people sometimes trot out, namely:

// This is wrong
for (unsigned n = 9; n >= 0; --n) {
  ...
}

// Compare with this, which works
for (int m = 9; m >= 0; --m) {
  ...
}

See the problem? That’s right, n >= 0 is always true, because n is unsigned; decrementing it from 0 results in a large positive number. Some people claim that, as a result, it’s “safer” to use a signed integer, though personally I think that’s a red herring.

The correct way to write this loop with unsigned is

for (unsigned n = 10; n-- > 0;) {
  ...
}

The final part of Graham’s post contains an odd example about using a uint8_t to hold a small count, but then failing to properly check it when using it. I must admit I fail to see the point he’s trying to make here; if you don’t check it, you don’t check it; it seems no safer using a uint16_t or even an int, because the problem is not overflow but that it isn’t being checked properly. Put another way, given

#define MAX 200
uint8_t ptr = 0;

void put(unsigned count) {
  ptr += count;
}

void get(unsigned count) {
  ptr -= count;
}

int main(void) {
  put(50);
  get(80);
  return 0;
}

changing the uint8_t to uint16_t is just not a fix for the actual problem. The fix is something more like

void put(unsigned count) {
  assert (count <= MAX - ptr);
  ptr += count;
}

void get(unsigned count) {
  assert (ptr >= count);
  ptr -= count;
}

though in a shipping application you might not actually want to assert() on failure.

Interesting OS X Crash Report Tidbits

Jan 10th, 2013

Yesterday I spent some time looking at OS X crash logs; as anyone who has been working on the OS X platform as long as I have will have noticed, the precise format of the crash logs your applications generate is now somewhat different to the reports that were generated way back on OS X 10.0.

Indeed, Apple documents no fewer than six versions of the crash log format, and that doesn’t include the variants in use on iOS, the effects of running applications under Rosetta, or a couple of newer formats that have appeared more recently. Aside from version 1 crash logs, all of the formats include a field named “Report Version”; a more complete table of crash log versions would look like this:

Version	Platform
1	Mac OS X prior to 10.3.2
2	Mac OS X 10.3.2 through 10.3.9
3	Mac OS X 10.4.x on PowerPC
4	Mac OS X 10.4.x on Intel
5	Apparently never shipped
6	Mac OS X 10.5 through 10.7
7	Output from `sample` command line tool
10	Mac OS X 10.8 and later
11	Mac OS X 10.8 spin/hang report
101	iOS 1 (reported as OS X 1.x)
102	Unknown
103	iOS 2
104	iOS 3 and later

Interestingly, the symbolicatecrash script that is used with iOS crash reports doesn’t appear to know about report version 101, but does know about report version 102. In case you don’t already know, symbolicatecrash can be found at /Developer/Platforms/iPhoneOS.platform/Developer/Library /PrivateFrameworks/DTDeviceKit.framework/Resources/symbolicatecrash. (With Xcode 4, this is inside the application bundle, so tack /Applications/Xcode.app/Contents onto the front too).

There’s little point going through all the changes between the different versions on Mac OS X, because, up to version 6 at least, they’re more than adequately documented in TN2123. Its iOS counterpart, TN2151, seems rather less useful, though it does at least include a list of exception codes that you might see.

iOS does differ a bit, though; it doesn’t include the “Command” field from Mac OS X, but it does include fields named “Incident Identifier” and “CrashReporter Key”, the former of which is a UUID and the latter of which is a 40 character long hexadecimal number.

Additionally, if a thread has been assigned a name using pthread_setname_np(), on Mac OS X the backtrace for that thread will start with a line resembling the following:

Thread <number>[ Crashed]:: <name of thread>

while on iOS you’ll see two lines:

Thread <number> name:  <name of thread>
Thread <number>[ Crashed]:

Obviously the thread state dump (with the registers) will differ from processor to processor, but there are additional differences in the format of the “Binary Images” section.

One other interesting feature (and it was this that got me interested yesterday) is that some crash reports now contain additional information labelled “Application Specific Information” or similar. This could be quite useful, but the mechanism by which it appears is completely undocumented…

Anyway, as a result of my investigations yesterday, it seems there are two different mechanisms for adding this to a crash report. On Mac OS X and iOS, there is a special symbol __crashreporter_info__ that you can define that lets you add to the “Application Specific Information” field; e.g.

static const char *__crashreporter_info__ = 0;
asm(".desc __crashreporter_info__, 0x10");

void crash(void)
{
  __crashreporter_info__ = "This crash is expected!";
  *(int *)4 = 8;
}

The asm statement is used to mark the __crashreporter_info__ field as “referenced dynamically”; this means it won’t get stripped and will be included in the resulting binary so the crash reporter can see it.

I don’t know exactly which version of Mac OS X this was added in, but it is the older of the two mechanisms and exists on iOS as well.

On newer versions of Mac OS X, you can also give extra information to the crash reporter via a special data structure:

/* crash_info_t is always 64-bit, even if you build 32-bit code,
   so we set the alignment of its members to 8 bytes to achieve
   the appropriate layout in both cases */
#define CRASH_ALIGN __attribute__((aligned(8)))

typedef struct {
  unsigned    version   CRASH_ALIGN;
  const char *message   CRASH_ALIGN;
  const char *signature CRASH_ALIGN;
  const char *backtrace CRASH_ALIGN;
  const char *message2  CRASH_ALIGN;
  void       *reserved  CRASH_ALIGN;
  void       *reserved2 CRASH_ALIGN;
} crash_info_t;

#define CRASH_ANNOTATION __attribute__((section("__DATA,__crash_info")))
#define CRASH_VERSION    4

crash_info_t gCRAnnotations CRASH_ANNOTATION = { CRASH_VERSION,
                                                 0, 0, 0, 0,
                                                 0, 0 };

void crash(void)
{
  gCRAnnotations.message = "Message #1";
  gCRAnnotations.signature = "My test crash";
  gCRAnnotations.backtrace =
  "0   MyTest     0x12345678 myTest(3, 4, 5)\n"
  "1   MyTest     0x23456789 myMain";
  gCRAnnotations.message2 = "Message #2";
  *(int *)4 = 8;
}

Both of these mechanisms are per image. That is, the crash reporter will collect up information from every image loaded into the address space of the crashed process. In the case of the “Application Specific Information” field, the messages are output one after the other; the same is true for the “Application Specific Signature” field. If you are generating your own backtrace (as the NSException code does), each backtrace is output separately, and they are numbered… for instance, if we change the line

  *(int *)4 = 8;

to read

  [NSException raise:@"TestException" format:@"Nothing to see here"];

then the resulting crash log will contain two backtraces labelled “Application Specific Backtrace 1” and “Application Specific Backtrace 2”.

Important Note

None of this is documented. If you are going to use it, be sure that you initialise the variables to zero/NULL, and DO NOT USE THE reserved or reserved2 fields.

Hglist

Dec 13th, 2012

To anyone who has ever wanted a better hg manifest, or just a quick way to inspect a set of files controlled by Mercurial to see e.g. the last change revision: take a look at hglist.

Why Not Unicode in Identifiers?

Nov 21st, 2012

Earlier today, Graham Lee linked on Twitter to a piece by Poul-Henning Kamp about the “tyranny of ASCII” in programming language syntax.

Kamp’s contention is that we should be free to use (for instance) “Dentistry symbol light down and horizontal with wave” (U+23C7, ‘⏇’ if your browser has it) as an identifier in a program if we so choose. Or, perhaps more reasonably, Ω₀.

It’s certainly an appealing idea, especially to anyone who has ever attempted to implement a mathematical algorithm, or even an otherwise non-mathematical algorithm that has come from an academic paper (which tend to use mathematical notation).

It is, perhaps, less than obvious what dangers await the unwary in this area; I think by now many people are familiar with the confusability of various glyphs, but perhaps it is not so obvious that e.g. ‘a’ and ‘а’ are in fact entirely different characters (the second is Cyrillic). Nor is it obvious to the uninitiated that ‘é’ differs in any way from ‘é’, or that care must be taken when comparing strings as a result, let alone more problematic character equivalences like ‘ß’ and “sz”/“ss”.

If we are going to propose allowing Unicode in identifiers, then, we need to specify:

Which code points are and are not going to be allowed? e.g. Do we allow combining marks? Spacing modified letters? Private use characters?
Do we allow bi-directional text? What about embedded bidi? If so, we can support Hebrew and Arabic, but matching identifiers is going to get complicated really quickly (the characters might be in either order, depending on the Unicode bidi rules)
Do we allow identifiers to consist of characters from different scripts? For instance, is “аnd” a valid identifier? It isn’t the same as “and”…
How are the compiler and linker going to determine a symbol table match? Does “æ” match “ae”? Does “ï” match “i”? What about “ß” and “ss”?
What about the system linker (probably most important on systems with dynamic linking support)? Name mangling might be a solution, but we get exactly one chance to get that right before creating ABI compatibility problems. C++ didn’t do so well at that, as I recall.

Additionally, because different people have different fonts on their systems and not all code points necessarily have glyphs in all of (or even any of) those fonts, we perhaps need to think what will happen if a developer opens a source file on a machine that is lacking some glyphs that are needed to render an identifier. How is that displayed to them? Is it just a matter for their text-editor? Maybe so, but until text editors have a good way to deal with this situation, it’s potentially a bigger problem for the developer themselves.

Another concern is that, particularly in the case of Arabic, Indic and Far-Eastern languages, just typing “characters” on your keyboard can be quite an involved process. Certainly speakers of those languages may be familiar with the input methods they need to use, but that is not true generally, and if we permit identifiers containing such characters we risk fragmenting the developer community along natural language boundaries — people will write code that can only easily be used by a native user of their particular script. ASCII may be a poor subset of Unicode, but it’s very unlikely that any computer user anywhere on the planet is unable to type the majority of its characters. Unlike (say) allowing CJK characters in code, which means that anyone unfamiliar with the script will be relegated to using Cut and Paste (it isn’t even feasible in that particular case to attempt to locate the character by looking through a Unicode character palette — there are many thousands of different CJK characters).

Even if we restrict ourselves to (say) Latin characters plus accents, there’s still plenty of potential for confusion; just look at ‘ë’ and ‘e̎’ or ‘ç’, ‘c̡’, ‘c̦’.

For my money, then, allowing arbitrary Unicode identifiers is a mistake. I have nothing against the use of arbitrary Unicode in comments and string constants, but in the core syntax of a programming language more care is necessary in order to avoid creating problems.

Of course, some people retort that the issues I raise could be addressed by means of policies of individual projects. That’s true to a degree, though it still invites fragmentation of the developer community (which, I contend, is highly undesirable), and it doesn’t really address problems such as that of deciding on the equivalence of different characters/identifiers.

Personally, while I’m in favour, in principle, of expanding the set of characters we allow in programming languages, I think it needs to be done carefully and with considerable thought.

AFP, NFS and Mountain Lion

Sep 10th, 2012

If you’re unlucky enough to be attempting to use Mac OS X Server to host users’ home directories, you’ll know that in recent versions the native AFP implementation has been getting worse and worse.

We’ve been seeing problems for a long time now, including files that you can’t delete, failures to save files, and even (occasionally) actual data corruption. Last Monday, everything came to a head; I tried to log-in, and after about a minute and a half, my machine was beachball city. Checking the log files, the culprit was quite clearly AFP (which had disconnected the AFP share in which my home folder exists, and was attempting to reconnect).

This ruined my day; I did nothing other than attempt to troubleshoot AFP. I tried reinstalling Mac OS X. No joy. Then I tried reinstalling it from scratch, having wiped the disk. Same problem. Next, I tried moving my preferences folder aside. That seemed to work, but today the beachballing was back. In fact, today, it was worse — it was affecting one of the other guys in the office too.

We’ve filed bug reports about most of these issues, though I wasn’t actually able to file the most recent bug because (ironically) Bug Reporter failed to respond every time I hit Submit.

Anyway, today, in order to let Ed, the other person seeing this problem, get on with his work, I had his home mounted over SMB for most of the day. Which was sort-of OK aside from the fact he couldn’t see files in his Downloads folder. At all. And a few other folders too. Turns out this is a known problem with SMB on Mac OS X.

So AFP doesn’t work. SMB doesn’t work. What other network filesystem can I use on Mac OS X (without messing about, and without going for non-POSIX-ish options like Coda or AFS)? Answer: that old UNIX staple, NFS.

On the server side, NFS was actually pretty easy to set-up; I just edited /etc/exports to add our users folder, then did sudo nfsd update.

The client side was a little more annoying.

We use Open Directory (basically LDAP, but with some sugar over the top to make it look more like Netinfo). OD has an attribute HomeDirectory, which looks like this:

<home_dir>
    <url>afp://server.example.com/Volumes/MyDisk/Users</url>
    <path>fred</path>
</home_dir>

Now, it turns out that Apple installs a neat tool called od_user_homes into /usr/libexec that can be used together with the default auto_master and auto_home configuration to automatically pick up the correct NFS share from Open Directory. All you need to do is ensure that the <path> part of the HomeDirectory is empty—i.e. this should work:

<home_dir>
    <url>nfs://server.example.com/Users/fred</url>
    <path></path>
</home_dir>

You also need to update the NFSHomeDirectory path to /home/fred, since we’re now using the auto_home system, but that’s straightforward.

However, and it’s a big however, if you do this, you’ll find that you can log-in just fine via ssh. What you can’t do, though, is log-in to a GUI session, and as usual with Mac OS X you get a cryptic message about not being able to log-in at the moment. Looking in the log files, we find

10/09/2012 16:41:07.039 authorizationhost[436]: ERROR | -[HomeDirMounter mountNetworkHomeWithURL:attributes:dirPath:username:] | Unknown URL attribute in user's home_loc (nfs://server.example.com/Users/fred)

GAH!

The problem is that the HomeDirMechanism bundle, which contains code to mount AFP and SMB-based home directories, doesn’t know what to do when it sees a URL starting nfs://, and, rather than doing nothing, it barfs, preventing users from logging in. HomeDirMechanism is the thing that was recently the subject of a security problem because it was logging passwords in plaintext. It is also the usual cause of the cryptic can’t-log-in-now problems. In fact, it’s just plain pants.

Since we’re using NFS now, we don’t need it!

We can just go to /etc/authorization and comment it out:

<!-- <string>HomeDirMechanism:login,privileged</string>
 <string>HomeDirMechanism:status</string> -->

Much better. Now we can log-in.

An unexpected and happy side-effect of switching to NFS has been improved performance. Vastly improved. When I did ls in my Source folder over AFP, the machine used to take several seconds to return with a directory listing. Now it’s instant.

There are, of course, some downsides to using NFS over AFP, but at least it’s reliable.

First Photos of Our Wedding

Jul 17th, 2012

For those who don’t already know, on Saturday Jo and I got married at The Goodwood Hotel near Chichester. I’ll write more about this on another occasion, but for now I thought I’d put up a couple of photos that my father-in-law, Bob, posted on Facebook:

'Jo and Alastair cutting the cake together.'

'The most amazing cake ever. Really.'

Update

Amanda Every, our photographer, has put a selection of shots up on her blog.

← Older Blog Archives Newer →

Alastair’s Place

Software development, Cocoa, Objective-C, life. Stuff like that.