Optimization Without Measurement - a Seductive Trap

Readers of Apple’s cocoa-dev mailing list will have seen a number of messages recently attacking the Objective-C feature of making messages to nil do nothing.

One of these posts (whose author shall remain nameless) made the claim that

…it’s disappointing that people still think that it’s quicker to just send the message to nil, than to do “if (target) [target message];” since it doesn’t matter how fast you make that message dispatcher, it can’t possibly be faster than the two instructions that the conditional takes…

These kinds of arguments are a very seductive trap for the unwary programmer; they seem “obvious”, right? Nevertheless, as I’ll show, they aren’t always as clear cut as they first seem.

If you are considering optimizations to your program (and you should), you absolutely must measure to see where you should spend your time. Indeed, if you write programs for Mac OS X, you are spoilt for choice in terms of the tools available to help you do this. (I’m a particular fan of Apple’s Shark tool… it’s a large part of the reason that iDefrag’s main display is so fast.)

With that in mind I decided to measure the function call overhead both for Objective-C messages and for a few other common types of function call. Apple don’t have anything specific that can help out with this type of measurement, so I fell back on a simple C test rig. I’m not going to reproduce the code for my complete test here, but I will give you a flavour:

#include <sys/time.h>
#include <stdio.h>

static double
hires_time(void)
{
  struct timeval tv;

  gettimeofday (&tv, NULL);

  return tv.tv_sec + tv.tv_usec * 1e-6;
}

#define ITERATION_COUNT 10000000

int
main (void)
{
  double elapsed;
  unsigned n = 0;
  volatile unsigned a = 0;

  elapsed = -hires_time();
  for (n = 0; n < ITERATION_COUNT; ++n) {
    /* The thing you're timing goes in here */
    ++a;
  }
  elapsed += hires_time();

  printf ("Incrementing a variable in memory takes %g ns.\n",
          (elapsed / ITERATION_COUNT) * 1e9);

  return 0;
}

(Note that you may have to be careful when compiling test rigs like the above to make sure that the compiler’s optimizer doesn’t optimize away the thing you want to test. Be very wary about any unexpectedly fast results.)

Anyway, back to the nil issue. For my test program, my Mac Pro (3GHz Dual-Core Xeon) reports the following results:

Message sends take 5.34852 ns.
Nil sends take 6.01709 ns.
IMP sends take 2.67442 ns.
Function calls take 2.00583 ns.
Dylib calls take 2.67425 ns.
Virtual function calls take 2.33997 ns.

A nearby G5 machine (2.3GHz G5) reports:

Message sends take 15.6726 ns.
Nil sends take 9.14313 ns.
IMP sends take 4.35283 ns.
Function calls take 3.91874 ns.
Dylib calls take 7.83509 ns.
Virtual function calls take 5.87761 ns.
</blockquote></blockquote>

These numbers are timings for a call and return to an empty function/method, averaged over 10 billion calls (yes, billion).

Obviously this is an artificial benchmark, but the long and short of it is that message sends are fast. Not as fast as a plain C function call, perhaps, though using an IMP you can get them to go as fast as a call to a function in a dylib, or even faster on PowerPC machines.

The difference between

if (foo)
  [foo bar];


and

[foo bar];


is very small; according to my measurements, on the G5 it increases the time taken for a non-nil send by 1.3ns and reduces the cost of a nil send by around 7.8ns. For this to be a net gain on the G5, therefore, more than one in six of your message sends would have to be to nil.

On the x86 machine, the time taken for a non-nil send increases by 0.33ns, and we reduce the cost of a nil send by 5.7ns. For this to be a net gain, more than one in seventeen of your message sends need to be to nil.

Are one in six message sends in a typical application to a nil object? I doubt it. How about more than one in seventeen? It’s possible, but in high-performance code (the only place this kind of saving would ever matter) it seems unlikely.

Even if you removed the nil checks from the Objective-C runtime, you’re only going to make the non-nil message sends faster by a tiny amount, and for what?  You would have to add tests all over the place for nil, and performance critical code won’t ever care anyway because it will be using IMPs or C functions. So all you’d have done is made it more likely that the end user will lose their data when a nil does turn up unexpectedly. Before the change, your program might have behaved oddly (though there’s a good chance it would have let them save their work). Now it will crash instead.

Perhaps you could have some sort of error handling routine that triggered on a call to nil, but then the runtime needs to check for nil again, so you’ve erased the saving.  On top of that, you’ve made all of your code more complicated in the process, and as a result you’ll now be paying a penalty because branch prediction won’t work as well with thousands of “if” statements as opposed to the one branch in the runtime.

The point? There is no credible performance argument for sends to nil being anything other than a no-op. Even a lot of supposedly high-performance code doesn’t need to shave off the few nanoseconds that you might save, and that’s only if messages to nil are more common than the figures above, or if you’ve modified the runtime and accept that messages to nil will cause an outright crash.

In fact, even though it has some justification (because there will be fewer conditional branches in ordinary code), there is no real performance argument in the other direction either; you simply don’t need those few nanoseconds. In the few cases where you might care, you won’t be writing your code that way anyway.

Alastair’s Place

Software development, Cocoa, Objective-C, life. Stuff like that.

Optimization Without Measurement - a Seductive Trap