Alastair’s Place

Software development, Cocoa, Objective-C, life. Stuff like that.

Why You Should Learn About Algorithms

Last month, Janie Clayton wrote a blog post about a particularly odd interview she had. A lot of what she writes is spot on – it is ridiculous interviewing for an iOS developer and expecting them to answer questions in Java, and it’s even more ridiculous offering to allow someone to use a language you aren’t comfortable with as an interviewer and then telling them that they can’t after all because you don’t know it yourself! Yes, all of that happened. Read the post here.

During this interview, Janie was asked to write a linked list; this is probably the second simplest data structure after an array, and her response to being asked about it was to tell the interviewer that she was

a hacker who learned programming by writing applications rather than learning algorithms and data structures you only use to pass code interviews at corporate entities

and was slightly incensed when the interviewer responded

“Oh, so you’re not a programmer. You’re more of a management type.”

I think one of the reasons Janie got a bit of push back here (which she talks about in her most recent blog post) is that while she’s right that it’s quite unlikely in run-of-the-mill programming jobs that you’ll find yourself needing to implement a linked list, the implication of her response is that this stuff is hard, that it needs a great deal of learning, and that it will be a waste of her time.

None of that is true.

Put another way: there is a reason they teach this stuff in Computer Science degrees. (I do have a CS degree – well, Information Systems Engineering, which included CS and Electronic Engineering – but I learned a lot of this stuff on my own before starting my degree.)

On the Linked List

Let’s deal with the linked list thing first. Even if you know what one is, the chances are very good that it’s the wrong data structure to use. On modern microprocessors, in 99% of cases cache locality is more important than being able to manipulate lists using pointers, so you should use an array instead. Or a CFArray. Or a Python list. Or a C++ std::vector.

If I ever interview you and ask you about a linked list, it’s because you said you had a CS degree and quite probably you failed to answer a question about a more sophisticated data structure I asked you about. Either that, or I’m going to get you to reason about it somehow and the list itself isn’t really what the question is about, and in that case, if you said you didn’t know what one was, provided you didn’t study CS, I’d show you because the point wasn’t the list, right? (If you did study CS and don’t know what a linked list is, you just failed the interview; regardless of whether you’ve ever used one or not in a real program, you were taught about it and you really should know.)

For the benefit of those who don’t know what a linked list is, imagine you want to store the integers 2, 4, 6, 8, 10. You could use an array

array digraph "array" { bgcolor="#f8f8f8"; node [shape=record]; array1 [label="2|4|6|8|10"] } array array1 2 4 6 8 10

but if you wanted to insert, say, 7, into the array, you’ll have to resize it and copy data around. On modern architectures, in most cases, that’s actually the right way to implement this, but on older systems, on the less powerful hardware used in embedded systems, or in certain special cases you might instead choose to store the numbers like this:

list digraph "list" { bgcolor="#f8f8f8"; rankdir=LR; node [shape=record]; head [shape=plaintext,label="head"]; e2 [label="{2|<next>}"]; e4 [label="{4|<next>}"]; e6 [label="{6|<next>}"]; e8 [label="{8|<next>}"]; e10 [label="{10|<next>nil}"]; head -> e2:w; e2:next -> e4:w; e4:next -> e6:w; e6:next -> e8:w; e8:next -> e10:w; } list head head e2 2 head->e2:w e4 4 e2:next->e4:w e6 6 e4:next->e6:w e8 8 e6:next->e8:w e10 10 nil e8:next->e10:w

Each number is now stored in a structure with two elements (traditionally called a node); the first is the number, while the second is a pointer to the next structure in the list. This is called a singly-linked list, and it should be apparent that inserting 7 into it is just a matter of allocating a new list node, putting 7 into it, setting its pointer to point at the node containing 8, and then updating the pointer in the node containing 6 to point at it.

Obviously with a singly-linked list, if you have a pointer to a node, you can easily obtain a pointer to the next node, but you have no way to go backwards through the list; this also makes it hard to remove a node given just a pointer. The desire to go either way through the list, and also to make node removal as easy as node insertion leads to the idea of the doubly-linked list:

doubly-linked list digraph "doubly-linked list" { bgcolor="#f8f8f8"; rankdir=LR; node [shape=record]; head [shape=plaintext, label="head"]; e2 [label="{<v>2|<prev>|<next>}"]; e4 [label="{<v>4|<prev>|<next>}"]; e6 [label="{<v>6|<prev>|<next>}"]; e8 [label="{<v>8|<prev>|<next>}"]; { rank=same; e10 [label="{<v>10|<prev>|<next>nil}"]; tail [shape=plaintext, label="tail"]; } head:e -> e2:w; e2:next:e -> e4:w; e4:prev:n -> e2:v:n; e4:next:e -> e6:w; e6:prev:s -> e4:v:s; e6:next:e -> e8:w; e8:prev:n -> e6:v:n; e8:next:e -> e10:w; e10:prev:s -> e8:v:s; tail:s -> e10:n; } doubly-linked list head head e2 2 head:e->e2:w e4 4 e2:next:e->e4:w e4:prev:n->e2:v:n e6 6 e4:next:e->e6:w e6:prev:s->e4:v:s e8 8 e6:next:e->e8:w e8:prev:n->e6:v:n e10 10 nil e8:next:e->e10:w e10:prev:s->e8:v:s tail tail tail:s->e10:n

There’s also a smart-ass variant of the above where there’s only one “pointer” per node, which consists of the exclusive-or of the pointers to the previous and next nodes, which is neat but unless you’re on a memory restricted microcontroller you really shouldn’t.

Circular lists

By the way, there is a nice variant that I haven’t seen in any textbooks, namely the circular list, which lets you quickly add elements at either end of the list and also simplifies bookkeeping because there are never any null pointers.

Here’s a singly-linked version:

singly-linked circular list digraph "singly-linked circular list" { bgcolor="#f8f8f8"; rankdir=LR; node [shape=record]; tail [shape=plaintext,label="tail"]; e2 [label="{2|<next>}"]; e4 [label="{4|<next>}"]; e6 [label="{6|<next>}"]; e8 [label="{8|<next>}"]; e10 [label="{<v>10|<next>}"]; tail:e -> e10:w [weight=10]; e2:next -> e4:w [weight=10]; e4:next -> e6:w [weight=10]; e6:next -> e8:w [weight=10]; e8:next:n -> e10:v:n [weight=1]; e10:next -> e2:w [weight=10]; } singly-linked circular list tail tail e10 10 tail:e->e10:w e2 2 e4 4 e2:next->e4:w e6 6 e4:next->e6:w e8 8 e6:next->e8:w e8:next:n->e10:v:n e10:next->e2:w

Note that we keep a pointer to the last element; to insert at the head of the list, we update the last element’s pointer but not the tail pointer, whereas to insert at the end of the list, we also update the tail pointer.

If you ever have cause to implement a linked list algorithm, I strongly recommend using the circular variant. And if you are unlucky enough to turn up for an interview where someone really does want you to show them a linked list, draw that kind and explain to them what the benefits are (no null pointers, simplified manipulation, fast insertion/removal at either end with only a single tail pointer to manage). Well, if you want the job, anyway.

Why you should learn about algorithms

Note that I said “learn about”, not “learn”. You do not need to be able to write a Quicksort or Shell sort routine from scratch and I would never ask someone to in an interview; if you need to do that, you’ll be able to look it up.

The main thing to understand here is the idea of algorithmic complexity. Usually we’re talking time complexity but occasionally someone might care about space complexity too. Complexity is a measure of how expensive the algorithm is, and we typically express it using “big O notation”. Some examples:

Notation Meaning
O(1) The algorithm takes constant time (best possible)
O(log n) The algorithm takes time proportional to the logarithm of the size of the input (good)
O(n) The algorithm takes time proportional to the size of the input (OK)
O(n2) The algorithm takes time proportional to the square of the size of the input (not great)
O(2n) The algorithm takes exponential time (bad)
O(n!) The algorithm takes time proportional to the factorial of the size of input (really bad)

You may also see people talk about worst case, amortised worst case and average case. Worst case and average case are fairly easy; amortised worst case is where you consider the overall cost of an algorithm over a set of inputs – the idea being that the amortised worst case will be lower if the worst case is hit less frequently.

It’s also important to understand that, in addition to their complexity, many algorithms have a fixed cost, and that there is a general trend towards higher fixed costs for algorithms and data structures with lower time complexity.

How is this useful? Well, many languages and runtime libraries make you choose what kind of container to use to hold your data, and this choice can have a noticable — and sometimes extreme — impact on your program’s run time and memory usage. To help you make an informed choice, the documentation will hopefully tell you the algorithmic complexity (or cost) of the operations on the container. For instance, looking at std::vector::operator[], we can see that its complexity is listed as “constant” (i.e. O(1)), whereas std::map::operator[] lists its complexity as “logarithmic in the size of the container” (i.e. O(log n)).

The C++ STL also has a few other types you could use instead of std::vector, for instance std::deque or std::list. It makes you, the developer, choose, and to make that choice you need some idea of which will be better for your particular application.

That’s a bit painful, and on iOS and macOS, we’re very lucky — Core Foundation’s containers are smart and automatically use an appropriate implementation for the number of items they contain. So, for instance, a small CFArray is basically just a C array, but as it grows it changes into a somewhat more sophisticated data structure that allows fast insertion and deletion in spite of the number of elements it holds. That said, there will still sometimes be occasions where you need to choose between a CFArray and a CFDictionary, and there may be occasions when you need a tree rather than a hash, in which case you might end up rolling your own.


Learning this stuff will take months?
You can learn the basics very quickly (hopefully reading the above was quite useful).

I could more profitably spend my time learning Core Data?
Yes, maybe, though this stuff will have applications there too.

Those algorithms textbooks are huge and hard to read :–(
Well, some of them are, yes. I’d recommend you pick up a copy of Sedgewick’s Algorithms in <language>. It’s available in a variety of different language flavours (I have a C++ copy, but I’ve seen C, Pascal, and Java, and there are probably others too), it’s short and accessible (lots of pictures and short example programs). Even skimming it will give you at least some idea of where to look when you need to.


If you go for an interview for a job as a programmer, it isn’t unreasonable to expect that someone will ask some questions relating to fundamental algorithms or data structures. If someone does ask, they aren’t trying to discriminate against the underprivileged; they’re trying to discriminate between job applicants on grounds of competence. Even if the question seems irrelevant to what you’re going to do, it’s a good bet that someone who gives a good answer is going to be better at doing the simpler work where you don’t need to know this, and that is something that will factor in to the decision about who to hire. (Of course, that somebody may also be more expensive to hire, so bear that in mind too.)

Now, as I said, I wouldn’t ask in an interview about linked lists per se, unless you say you have a CS degree and you’ve just failed to answer a question I think you should know the answer to, in which case I’m probably trying to decide whether you lied about your degree.

I might ask you to show me how you would search a string (but I don’t expect you to know the best answer OTOH; the point is to work through it and see how you react). I might ask about the merits of hash tables (e.g. std::unordered_map or CFDictionary) versus trees (e.g. std::map). I would, however, take into account your background when thinking about your answer, and if you didn’t know about something I might explain a bit and see what you had to say. The point, often, is about testing your reasoning skills, not about whether you know the answer and can rattle it off.

One final word of advice: if you respond to a question in an interview, however silly you feel it is, with snark, you probably aren’t going to get the job. Part of the reason for interviewing people is for both parties to decide whether they’d like to work together, and snark is going to put people off.

Don’t Bash Iframe Payment Forms


OK, some background first. Owing to the increasing level of card-not-present fraud committed via the Internet, and the generally lax security standards of some of the websites involved, the Payment Card Industry Security Standards Council (PCI SSC) was formed and tasked with creating and maintaining a set of security standards called the Payment Card Industry Data Security Standard (PCI DSS).

The idea is a good one, as are many of the rules themselves, though I think it’s legitimate to criticise PCI-DSS for demanding things of smaller businesses that are simply unrealistic. The upshot of this is that smaller companies, and the payment processors who serve their market, wish to avoid the burden of being PCI compliant, but because they know that conversion rates are strongly impacted by being sent to a third-party site for payment, they would also like to design payment flows where a small business is able to take card payments on its own website.

The first attempt at this was to use client-side Javascript to securely encrypt the user’s payment data, and then the payment form itself would be submitted to the merchant’s system, but with only the encrypted blob rather than the original payment details. The downside of this approach is that if something goes wrong with the Javascript code and the HTML form isn’t carefully written, payment details go to the merchant’s server anyway and they are dragged into the scope of PCI compliance.

This method of avoiding having to be fully PCI compliant was “dealt with” in PCI DSS 3.0, which specifically imposes a compliance burden on sites doing the above.

However, PCI DSS 3.0 does allow payment processors to host parts of the payment form on their own servers instead, such that the merchant can embed those parts into the merchant’s own form using HTML iframe tags. This provides the same visual effect, but at reduced risk because it no longer relies on client side Javascript to keep the payment data away from the merchant’s servers.

So, that’s the background.

Why am I writing this?

Now, on Troy Hunt’s blog, in the comments, I happened across some remarks from Craig Francis:

This Stripe implementation is insecure as well.

They use an iframe, which is trivial for a malicious hacker to replace if the original website is hacked (often possible as they use old software, FTP, bad passwords, etc – which all gets missed at the basic level of PCI checking, that Regpack also seem to suggest is acceptable).

Troy is right to suggest that you should go to the payment gateway directly to enter your details, at least customers will know who has them.

I’m currently working with Christine at Google to pressure the PCI council into doing something about this.

Craig then linked to this piece on his blog which advocates extending full PCI compliance (technically SAQ-A-EP) to those businesses who are using iframe-based payment systems.

This would, in my opinion, be a huge mistake.

The claim, basically, is that an iframe-based system is insecure because a third party could edit the page in which the iframe is embedded and make it point somewhere else. This is true, and it is a genuine vulnerability.

But what are the alternatives for smaller businesses? Well, the alternative being suggested is that they should send their customers off to a third-party payment processor’s website, have the details filled in there, and then come back again. Those of us who run small businesses that take card details will tell you for nothing that this causes two problems:

  1. Our conversion rate drops. Instantly. Customers don’t like being bumped to another website, which they probably don’t recognise anyway, to make a card payment.

  2. We actually get people e-mailing us to tell us they think they might be being defrauded. Wait, what? Yes, that’s right. Customers don’t expect to be suddenly redirected elsewhere; when it happens, they think something dodgy is going on.

Now, if your goal is to destroy small business and make the huge advantages experienced by big businesses even bigger, that’s a great idea. What it won’t do is improve security. Why? Because passing customers off to a third-party payment website has the exact same vulnerability we were just talking about. The web page that does it could be edited by a malicious third party, and pointed at a different page.

OK, you might say, but in that case you’ll see it in your browser’s address bar. Sure. Do you know the names of every payment gateway on the Internet? No, me neither. So how do you know that the page you’re looking at is a genuine payment processor? If you’re about to utter the words “they have an EV SSL certificate” or “because my address bar is green”, I have news for you: it’s easy to get an EV certificate. Even if we assume that certificate authorities can’t be convinced to issue EV certificates in error, all the certificate really says is that it belongs to the party listed in the certificate details. It doesn’t tell you they’re trustworthy.

What should happen?

So Craig’s assertion that merchants using the iframe approach should be forced to use SAQ A-EP, the more onerous compliance route, is clearly a non-starter. It doesn’t improve security in practice, and has a significant impact on lots of small businesses, most of whom will be forced to use third-party payment gateways, which is not only bad for business but is annoying for their customers too.

It’s also worth pointing out that, assuming we did tighten up this aspect of PCI DSS, there is still nothing stopping someone from setting up a website with a similar name, copying its appearance from a given merchant’s site, and defrauding customers that way. This is exactly the same kind of fraud we’re worrying about here — customers are being sent to a site other than the one they should be being sent to — only now it would be happening via Google, instead of from the merchant’s own (hacked) page. Should Google search suddently be dragged into scope for PCI DSS somehow? I don’t think anyone sensibly argues that.

This is a hard problem, and the iframe solution is not perfect, but it is an improvement over the client-side Javascript approach and it isn’t significantly less secure than redirecting to a third-party website to perform the payment.

The way forward is probably services like Apple Pay, which is now available in Safari 10, where the browser is responsible for capturing the payment information and sending it securely to the payment processor. Even that is not perfect — hackers could still change the merchant’s site to point at a different payment processor and try to collect money that way.

But aren’t servers insecure if they aren’t completely PCI compliant?


Nor are completely PCI compliant systems necessarily secure.

PCI DSS compliance means that the system in question ticks all the relevant checkboxes in the latest PCI DSS standard, meets any audit requirements and has the appropriate paperwork in place. There’s a good chance that systems that are PCI DSS compliant are secure, but it isn’t guaranteed.

Why, if your system is secure, would you not want the burden of PCI DSS compliance? Well, unless you think that all small businesses’ websites (and we’re talking about sites here that explicitly avoid touching payment data) need automated audit logs, two factor authentication, sophisticated penetration testing, incident response plans, written security policies, written change control procedures, separate logging servers, and so on, I think you already know the answer to that question.

On Security Monoculture

A pet hate of mine for some time has been the blanket assertion from those who like to identify themselves as “security professionals” that nobody should write their own cryptographic code. I’ve heard a number of individuals voicing this view and implying that all that is wrong in the world of computer security would be fixed if people would simply stop it.

This is, and has been, for some time, the conventional wisdom. It is wrong.

Why do I say this? Simple. The conventional wisdom implies that we should all be using the exact same code behind the scenes (this is often accompanied by claims of the superiority of Open Source implementations as they will be reviewed by many more people). For many people, and for many applications, this thinking leads to using OpenSSL, as it is “tried and tested”, and is Open Source so lots of people must have looked over the code and decided it was good, right? Well, let’s take a look at the huge list of vulnerabilities that have been found in that library, or the comments that the founder of OpenBSD, Theo de Raadt, made about it after deciding to fork it and create LibreSSL instead.

(Fine, you might say, use LibreSSL, or Botan, or Secure Transport, or CryptoAPI, or…; well, yes, that’s kind of my point. But I wouldn’t want to recommend that everyone should use LibreSSL, or Botan, or Secure Transport either. It’s much safer if there’s a mix of software performing this task.)

Heartbleed was only such a big problem because everyone was using the single implementation that contained that bug. Well, almost everyone; some software was using Apple’s Secure Transport, or Microsoft’s implementation (via CryptoAPI), or one of the various other implementations that are floating about. But the overwhelming majority uses OpenSSL, and as a result, a single vulnerability affected everyone, everywhere, simultaneously.

Another implication of this “thou shalt not implement crypto” view is that the set of implementations we presently have should be fixed. Maybe even some of them should go away. After all, nobody should be implementing crypto software (the only exception seems to be if the person quoting this rule knows your name, in which case you’re probably D.J. Bernstein or Bruce Schneier or some such). But that will make matters worse, not better. It will increase the reliance on OpenSSL and make the monoculture worse; and everyone switching wholesale to LibreSSL won’t help in that regard (it might be better in other respects, but that’s another matter). Indeed, it even implies that you shouldn’t be submitting any fixes to OpenSSL, because you can’t possibly be a suitable person to be tampering with cryptographic software.

Now, do I think you, dear reader, should immediately go out and roll your own RSA implementation? No, absolutely not. I am categorically not in favour of everyone implementing their own crypto (or, worse, rolling their own cryptographic algorithm). It isn’t something you can throw together in an afternoon, without carefully researching the subject first, and it certainly isn’t something you should be doing without adequate testing to make sure you haven’t slipped up. There are lots of gotchas in this area that you won’t appreciate unless you go and learn about it first. But what I don’t like about the conventional wisdom on the subject is that it has tended to discourage people who are competent to do so from writing additional implementations, and has created an atmosphere where you’re likely to be yelled at for merely suggesting that it might be a good idea for that to happen.

Code Coverage From the Command Line With Clang

Having searched the Internet several times to find out how to get coverage information out of clang, I ended up feeling rather confused. I’m sure I’m not the only one. The reason for the confusion is fairly simple; clang supports two different coverage tools, one of which uses a tool with a name that used to be used by the other one!

About half of the posts seem to indicate that the right way to get coverage information is to use the --coverage argument to clang:

$ clang --coverage -g -Wall testcov.c -o testcov
$ ls
testcov      testcov.c    testcov.dSYM testcov.gcno
$ ./testcov
$ ls
testcov      testcov.c    testcov.dSYM testcov.gcno testcov.gcda

This appears to produce (approximately) GCOV format data which can then be used with the gcov command, noting that this is really LLVM’s gcov, not GNU gcov, though it appears to be designed to be broadly compatible with the latter. Older versions of LLVM apparently used to call this tool llvm-cov rather than replacing gcov with it, but that name is now used for a newer, separate tool.

The rest of the posts, including some on the LLVM site, instead recommend using the -fprofile-instr-generate and -fcoverage-mapping options:

$ clang -fprofile-instr-generate -fcoverage-mapping -g -Wall testcov.c -o testcov
$ ls
testcov      testcov.c    testcov.dSYM
$ ./testcov
$ ls
default.profraw testcov         testcov.c       testcov.dSYM

Instead of outputting GCOV data, this generates a file default.profraw, which can be used with llvm-profdata and llvm-cov

The way to use this file is to do something like

$ llvm-profdata merge -o testcov.profdata default.profraw
$ llvm-cov show ./testcov -instr-profile=testcov.profdata testcov.c

In case you were wondering: you must pass the raw profile data through llvm-profdata. It isn’t in the format llvm-cov wants, and apparently the “merge” operation does more than just merging.

Also, you can change the name of the output file, either by setting the LLVM_PROFILE_FILE environment variable, or by compiling your code with -fprofile-instr-generate=<filename>. This is mentioned in the help output from the clang command, but doesn’t seem to be anywhere in the clang documentation itself.

In both cases, you need to pass the coverage options to the clang or clang++ driver when you are linking as well as when you are compiling. This will cause clang to link with any libraries required by the profiling system. You do not need to explicitly link with a profiling library when using clang.

One final remark: on Mac OS X, gcov will likely be in your path, but llvm-profdata and llvm-cov will not—instead, you can access them via Xcode’s xcrun tool.

Why NOT Have a Code of Conduct?

Having just seen a demand that WWDC adopt a formal Code of Conduct for its attendees this year (rdar://25791520 if you want to dupe it, though please give this post a read first), I thought I’d write a little to express my thoughts about the Code of Conduct phenomenon (in more than 140 characters, since that seems somehow inadequate).

Let me start by saying that it has always been the case that most conferences reserved the right to eject you if you were in some way disruptive. As private events, they’re within their rights to do so (at least in Common Law countries), and if they have the appropriate wording in their Terms & Conditions they may not even have to refund your money.

Let me also say that I am not in favour of allowing harrassment or other bad behaviour by conference attendees, and I realise that there will be situations (e.g. where there are children present) where the organisers might want to draw attention to the fact that attendees should keep to their best behaviour.

So what is this Code of Conduct thing about? Well, a fair overview is this FAQ by Ashe Dryden, and there’s an example of the kind of thing we’re talking about on this website. To save time, I’d recommend that you go and read those now, then come back if you’re still interested in what I have to say.

OK, you’re back. So why would anyone object to these things?

1. Necessity.

We’re grown-ups, right? We should all, by now, know how to behave around other people, and for those who don’t, we already have a set of rules that we’ve collectively agreed upon that cover the worst kinds of harassment and bad behaviour, namely the law, plus — as I already mentioned — most conferences already reserve the right to remove you if you’re being disruptive.

I accept, for what it’s worth, that some people might find an explicit set of rules reassuring. Others, me included, do not. Quite the opposite, in fact, for reasons I’ll elucidate below.

2. Natural Justice

It’s commonly asserted that a problem with leaving this up to the law is that the police “don’t have a great history of responding positively”, that complainants may not wish to involve the police and that as a result it might be better for conference organisers to deal with things themselves.

Except… conference organisers are not trained to deal with these types of situations. A lot of this is going to boil down to one person’s word against another, and it’s very easy to allow your own personal biases to determine your response. Police officers are trained not to do that (not always successfully, for sure, but they are at least trained); of course, that does sometimes make people unhappy when they complain to the police, because the police don’t seem to believe them — but that’s a misunderstanding. The function of the police is not to believe or to disbelieve, but to investigate, and where there is evidence, to bring it before a court for prosecution.

That courts of law require high standards of evidence — at least in Common Law countries — is undeniable, and that’s because we’ve collectively agreed that the principle should be that people are innocent until proven guilty.

This is particularly important in some of the areas we’re concerned with here, because of the reputational impact on people subject to allegations of sexism, racism or (worse) sexual assault, and the notion that the response to allegations of that nature might be decided by conference organisers on the basis of a low standard of evidence, without any right of appeal, really worries me.

I know it’s also asserted that “false accusations are… incredibly rare”. I’m happy to believe that. But there is a whole grey area of allegations that might seem true from a certain point of view that isn’t necessarily shared by all parties, and there are even situations where the accused and accusing parties simply don’t know themselves what happened.

3. Out-of-Venue Activities

Ashe Dryden asserts that a “code of conduct should apply to any event where your attendees may congregate”. This seems generally problematic.

Certainly there are situations where conference organisers might need to get involved; I accept that. But it seems hard to justify extending the Code of Conduct to all activities outside of those organised by the conference organisers.

So, for instance, if someone misbehaves in a bar right outside the conference venue, where there are a lot of conference attendees present, it is totally appropriate for conference organisers to have words with that person. Or, actually, for anyone present to have words with that person. But unless they have broken the law, or upset the bar owner, you won’t be able to ban them from hanging around in that bar, even if you kick them out of your conference. And, furthermore, to the extent that you feel the Code of Conduct may constrain their behaviour, it certainly won’t if you have invoked it to bar them from the rest of your conference.

Equally, it seems preposterous to argue that the Code of Conduct should extend to a shopping trip to a supermarket half way across town. Or to e.g. a group of attendees who decide to visit a strip club (not my cup of tea, but some people clearly enjoy that kind of thing, and it’s very likely effectively banned in the code of conduct you were thinking of using).

And then there are all kinds of questions about whether the Code of Conduct protects people who are not conference attendees at all, or indeed how it protects people who are conference attendees against those who are not (hint: it doesn’t).

4. Scope

What should be banned? suggests that “harassment includes offensive verbal comments related to… technology choices”! So you could, in theory, be evicted from a conference for making rude remarks about PHP (or, I suppose, for calling someone an idiot for using it). That seems a step too far, for sure.

In fact, while we’re about it, what constitutes an “offensive verbal comment”? Does it have to meet a reasonable person test? Would it be inappropriate to reproduce the cartoons of the Prophet Mohammed? In all circumstances? Are you sure? Does the whole of the community agree? Or if not, does everyone agree to compromise somehow?

And what exactly is “harassing photography”? Some people are very sensitive about having their photograph taken (even accidentally), and others much less so. Who decides? Is there a right of appeal? How many photographs does one have to take before it becomes harassment?

It’s also worth reflecting that quite a bit of that code of conduct would ban many well-respected and enjoyable comedy acts outright.

Again, please don’t misunderstand — I am all for conference staff taking someone aside and explaining that they’re upsetting someone, asking them to please be sensitive to that person’s concerns, and even if necessary warning them that they will be ejected if they continue with their behaviour. What I’m trying to tease out here is that there is a lot of subjective judgement involved, and attempting to codify this in a Code of Conduct is fraught with danger.

5. Legal Certainty

You might think that having a Code of Conduct would create some legal certainty for organisers when they do decide to act, but if they use the one at they could be in for a nasty shock. For instance, as it’s currently worded, it bans “offensive verbal comments related to… sexual images in public spaces”, rather than banning sexual images in public spaces as I’m sure its author intended. Granted, it says “harassment includes…”, so we can be certain that the definition is not exhaustive, but in cases where contracts are unclear, Common Law takes the view that they should not be interpreted in a way that favours the party that drafted them. My guess is that in court you’d find that they chose to use the legal definition of “harassment” (whatever that may be) and then added in anything in the “includes” list, in which case if you evicted someone for “following” and that person sued to recover their conference fees (and potentially travel and legal expenses), you might well find yourself out of luck and out of pocket.

Maybe that’s an argument for getting a lawyer to look over them, but IMO, it would have been much better to just put into the Terms and Conditions that the organisers reserve the right to eject attendees for behaviour that the organisers determine to be detrimental to other attendees or to the conference as a whole. I think you’d also want to make it clear what the procedure for doing that should be — who had the right to make the decision(s), whether there was an appeal process, under what circumstances attendees’ money might be refunded and so on. And on that subject, trying to keep hold of the entire conference fee whatever is probably a bad idea; the attendee’s credit card issuer is very likely to side with them, so if you’re going to try to keep hold of it you’ll only want to do so in cases where you have solid evidence of their misbehaviour.

6. Protecting “Unpopular” People

Some people may be unpopular, or may hold views that are unpopular. You can certainly discuss this with them in advance if you think it will be a problem, and ask them not to raise their unpopular views at your conference. If they aren’t relevant to the conference itself, they might even agree to that.

Anyway, there are two problems here; the first is that some people appear to claim that the mere expression of a view with which they strongly disagree is some form of harassment, in and of itself. Indeed, there have even been demands to ban certain people from certain conferences on the grounds that people are aware that (or think that) they hold certain views, even if they have promised not to express them at the conference.

The second problem is that unpopular people (or those with unpopular views) are far more likely to be the targets of false — or at least questionable — allegations. I don’t want to pick individual people as examples, so I’ll stick to generalising here: if a well-known feminist makes a joke about men, it’s quite unlikely that anyone will complain, and even if they do, quite unlikely that anyone will do anything about it. If, however, a similar joke about women was made by a man, I would expect there to be complaints, and I would expect that Something Would Be Done. (I’m not trying to be anti-feminist here; I’m just observing that, right now, at least in tech. circles, a fairly muscular form of feminism is popular, and making any remark that conflicts with or disagrees with that is not.)

It’s also worth reflecting that the first problem includes things like the views Roman Catholics or Muslims hold about homosexuality, which certainly for some people meet the definition of “offensive verbal comments related to sexual orientation”. While one might argue that people who hold those views should keep them to themselves for politeness’ sake (and indeed most do), if someone knows that they hold those kinds of views, they might be tempted to try to goad them into expressing them in order to trigger the Code of Conduct and get rid of those people from the conference.

The irony here is that the intention of advocates of Codes of Conduct is generally to protect minorities, but that in practice they may in some cases achieve the opposite.

7. Protecting the Expression of Unpopular Views (in some cases)

Sometimes it might actually be appropriate to prioritise freedom of speech over someone else’s right to not be offended. Sometimes it’s better to let people debate points of view that they may find challenging or even downright offensive.

I grant you that at most technology related conferences, this won’t be relevant, but I find Ashe Dryden’s assertion that this point can be addressed by stating that “free speech laws do not apply to harassment” overly simplistic, even leaving aside the obvious point that the United States Constitution, wonderful as it is, doesn’t actually apply over most of the surface of the Earth. It occasionally does all of us good to hear views we don’t like or agree with, even views we find offensive, if only because it makes us think.

(FWIW, I can imagine that this might become a problem if you wanted to have a conference talk or panel about gender politics in technology, which is something of a live issue at the moment; it’s very likely going to involve, one way or another, things that someone or other feels are “offensive verbal comments related to gender”. If you think not, imagine inviting e.g. Milo Yiannopoulos to debate with Brianna Wu, assuming you could get them to sit on the same stage.)


All of this is only my opinion, and hopefully I’ve explained above why I think this way.

  • Organisers certainly should have procedures to deal with poor behaviour by attendees, or with situations where one attendee is upsetting another somehow.

  • It would be wise to put these procedures in the Terms and Conditions.

  • It would be wise to train conference staff to follow these procedures (e.g. insisting that they report complaints up the chain to organisers until they reach someone you trust to deal with them sensibly).

  • Trying to codify what constitutes good or bad behaviour creates problems, and it’s probably better to use very general language in your Ts & Cs, instead of trying to write an explicit Code of Conduct.

  • If someone breaks the law, or is alleged to have done so, you really should consider letting the police deal with it, whatever your opinion of their effectiveness might be.

  • Attendees outside of your venue will be exposed to people who are not at your conference and are not subject to your Code of Conduct at all anyway (this potentially includes anyone you kick out for violating your Code of Conduct). As such, Codes of Conduct do not “protect” attendees. At best, if carefully drafted, they may protect conference organisers from future lawsuits.

  • Whether you have a Code of Conduct or not, you should consult a lawyer to avoid creating problems for yourself down the road.

  • There is nothing wrong with telling your attendees you expect them to behave themselves, drawing to their attention the fact that there are children present, telling them that you expect them not to stream pornography over the conference WiFi and so on. This is not the same as having a formal Code of Conduct.

Still TL;DR

Codes of Conduct mainly protect the conference organiser (and only if they are carefully worded); they don’t protect attendees. Defining what is and is not acceptable is hard, and boils down to subjective judgement anyway. Better to put procedures in place, stick them in your Ts & Cs, and train conference staff appropriately.

Symbolicating OS X Crash Logs

iOS developers have it easy; to symbolicate an iOS crash log, they can drop the log onto the Organiser window in Xcode, and — in theory at least — it will be symbolicated for them.

But on OS X, this doesn’t work. Moreover, the symbolicatecrash Perl script that iOS developers could use as an alternative doesn’t understand OS X crash logs and so will refuse to process them.

You could try using Peter Hosey’s Symbolicator package, but it’s a bit buggy — looking at the code, Peter has misunderstood the “slide”, and it also can’t cope with Xcode archives containing multiple dSYMs. I did contemplate fixing it and submitting a patch, but while I don’t want to be unkind to Peter, I think I’d end up rewriting rather too much of it in the process.

You could also try LLDB’s symbolicator, which you use like this:

$ lldb
(lldb) command script import lldb.macosx.crashlog
"crashlog" and "save_crashlog" command installed, use the "--help" option for detailed help
"malloc_info", "ptr_refs", "cstr_refs", and "objc_refs" commands have been installed, use the "--help" options on these commands for detailed help.
(lldb) crashlog /path/to/crash.log

This is actually really rather neat, or it would be if it worked. Unlike other symbolicators, it annotates the backtrace with your actual source code (and/or in some cases disassembly) so that you can see where the crash took place. Additionally, if you run it as above, within lldb itself, it will set up the memory map as if your program was loaded. Very cool.

You will note that I said if it worked. Because, out of the box, it does not. The first problem is that the version shipped by Apple relies on a script, dsymForUUID, that is not provided and whose behaviour is not documented anywhere. I wrote something that should be suitable and put it up on PyPI so you can install it with e.g.

$ sudo -H pip install dsymForUUID

(But wait… you might not need to.)

The second problem is that it’s also a bit broken. It chokes on some crash logs because they contain tab characters rather than spaces, and it also only loads the __TEXT segment in the correct place, which makes for a bit of fun if you need to poke around in one of the other segments.

Anyway, I filed a bug report today about all of this, with a patch attached to it that fixes these problems. I’ve also put a copy of the fixed file here so you can download and use it.

In addition to the usage shown on the lldb website, you can, in fact, invoke it directly from the Terminal prompt, e.g.

$ /path/to/crash.log

which is a very convenient way to use it in many cases. Likewise, if you want to use this version rather than the built-in one, you just need to make sure it’s in your PYTHONPATH, then you can do

$ lldb
(lldb) command script import crashlog

to use it in lldb.

The fixed version does not require dsymForUUID, and indeed it’s rather faster without it, but it can use a dsymForUUID script if you happen to have one (e.g. because you work at Apple). To use it with your custom dsymForUUID, you need to set the DSYMFORUUID environment variable to the full path of your script.

Update 2016-05-04

I found an interesting bug in the symbolicator; I’ve uploaded a new script that fixes it.

Update 2016-05-31

I’ve moved to Bitbucket, and added support for symbolicating the output of the sample command.

Apple Help in 2015

The last time I had to build a brand new help file was some time ago — maybe even ten years ago — and in the world of software, that’s an age.

For the past few months I’ve been working hard on a new release of iDefrag, version 5, and as part of this I’m rewriting the documentation. Rather than using hand-written HTML like I did before, I’ve chosen this time around to use a documentation generator, Sphinx. The advantages of this approach include:

  • Built-in support for indexing and cross-referencing.

  • The ability to write the documententation in plain text.

  • Keeps the presentation details separate from the content (via theming and templates).

  • Supports multiple output formats, not just HTML.

The current version of Sphinx doesn’t directly support building Apple Help Books, but I’ve submitted a pull request to fix that so hopefully by the time you read this you’ll be able to do

$ sphinx-quickstart

fill in some fields and then do

$ make applehelp

to generate a help book.

(If you do do that, you’ll want to edit your file quite a bit, and you probably don’t want to use the default theme either.)

Anyway, all of the Sphinx related stuff was fine, and worked as documented. Unlike Apple Help, which doesn’t. I spent an entire day struggling to make a help book that actually worked, and most of that is because of problems with the documentation.

Let’s start with the Info.plist. Apple gives this not particularly helpful table:

Key Exact or sample value
CFBundleDevelopmentRegion en_us
CFBundleInfoDictionaryVersion 6.0
CFBundleName SurfWriter
CFBundlePackageType BNDL
CFBundleShortVersionString 1
CFBundleSignature hbwr
CFBundleVersion 1
HPDBookAccessPath SurfWriter.html
HPDBookIconPath shrd/SurfIcn.png
HPDBookIndexPath SurfWriter.helpindex
HPDBookKBProduct surfwriter1
HPDBookTitle SurfWriter Help
HPDBookType 3
HPDBookTopicListCSSPath sty/topiclist.css
HPDBookTopicListTemplatePath sty/topiclist.xquery

There are two serious problems with the table above. The first is that some of it is wrong(!), and the second is that it doesn’t indicate which values are sample values and which are required.

Here’s what you actually need:

Key Value
CFBundleDevelopmentRegion en-us
CFBundleIdentifier your help bundle identifier
CFBundleInfoDictionaryVersion 6.0
CFBundlePackageType BNDL
CFBundleShortVersionString your short version string - e.g. 1.2.3 (108)
CFBundleSignature hbwr
CFBundleVersion your version - e.g. 108
HPDBookAccessPath _access.html (see below)
HPDBookIndexPath the name of your help index file
HPDBookTitle the title of your help file
HPDBookType 3

The first thing to note is that CFBundleDevelopmentRegion should have a hyphen, not an underscore. Apple’s utilities generate this properly, but the documentation is wrong.

The second thing to note is that in spite of the documentation implying that you can use your help bundle identifier to refer to your help bundle (which would, admittedly, make sense), you can’t. You need to use the HPDBookTitle value. Oh, and ignore any references to AppleTitle meta tags. You don’t need those.

The third thing relates to HPDBookAccessPath. The file referred to there must be a valid XHTML file. In particular, it cannot be an HTML5 document — that will simply not work, and the error messages you get on the system console are completely uninformative.

The best solution I’ve come up with for this particular problem, as I want to generate modern HTML output, is to make a file called _access.html and put the following in it:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "">
<html xmlns="">
    <title>Title Goes Here</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="robots" content="noindex" />
    <meta http-equiv="refresh" content="0;url=index.html" />

This means that both helpd and the help indexer (hiutil) are happy, and I can write my index page using modern HTML. Incidentally, Apple appears to be using a similar trick in the help for the current version of Mail. Obviously you can change the index.html in the above to whatever you need.

In your application bundle, you need to fill in the following keys

Key Value
CFBundleHelpBookFolder The path of your help book relative to Resources - e.g.
CFBundleHelpBookName The value from HPDBookTitle, above

Note that while the HPDBookTitle is displayed to the user, it can be localised using InfoPlist.strings. Note also that you absolutely cannot, contrary to what the documentation implies, give a bundle ID here. It just doesn’t work. You could however, if you wanted, write an InfoPlist.strings file like this:

HPDBookTitle = "SurfWriter Help"

then put the bundle ID in as the HPDBookTitle in the Info.plist.

Oh, and if you think you’re going to be able to double-click a help book to preview it, think again. That won’t work. Instead, you need either to use it from within your application, or you can put it in ~/Library/Documentation/Help (you might have to make that folder) and double-click it in there. Why? Because help files are indexed and you can only open them if they’re registered in the index.

One other thing that isn’t really documented at all is what exactly the HPDBookRemoteURL will do for you. There’s some handwaving about being able to offer remote content updates, but how the URL is used is skirted over. Well, if you do set HPDBookRemoteURL, Help Viewer will essentially expect it to point at a copy of the Resources folder of your bundle; so if you have HPDBookRemoteURL set to, then you’re going to get requests like (and so on).

Useful update (Feb 29th 2016)

You may have noticed that Help Viewer has a button to toggle the table of contents in your help file. Matt Shepherd did a bit of work looking into this and it turns out that it’s controlled by a Javascript API — see Matt’s gist for more information.

January VAT Changes and the VAT Threshold

I’ve just spotted this petition, via a retweet from Dan Counsell, and as a member of HMRC’s Joint SME MOSS Working Group as well as the owner of a microbusiness I thought I’d make a couple of comments.

It isn’t particularly clear from the petition, but the problem being raised is that in order to register for the Mini One Stop Shop in the UK, you currently need to be registered for UK VAT. This is something that we have been talking to HMRC about, and I have the impression that HMRC is amenable, in principle, to allowing non-VAT-registered entities to use the Mini One Stop Shop system, though the details of that have not been worked out.

Note also that your sales here in the UK will continue to be subject to ordinary UK VAT, and will not be reported through MOSS, and even if your UK-only sales are below the UK VAT threshold, it’s likely that you have expenditure in the UK that involves an element of VAT, so you might want to consider a voluntary registration in any event, in order to reclaim your input tax.

(There is a related issue within the Mini One Stop Shop itself, in that there are no thresholds for amounts reported via MOSS. HMRC did try to negotiate a threshold, but other member states didn’t support the idea and it was dropped.)

It is also worth pointing out that the Mini One Stop Shop is optional. You don’t have to use it. The alternatives are:

  • Use a digital “marketplace” (e.g. Apple’s App Store, Google Play, Paddle). Marketplace operators, as of the 1st of January 2015, are required by law to deal with EU VAT for you. You will only need to deal with B2B transactions between you and the store operator.

  • Register for VAT in EU member states into which you are selling. This will mean filing multiple VAT returns and complying fully with (up to) 28 different sets of VAT legislation.

  • Use a distributor in EU member states you wish to sell into. The distributor is a business, so you only need worry about a B2B sale; B2C sales will be made by the distributor within the member state(s) in which it operates.

  • Stop selling to other EU member states.

For a lot of digital micro-businesses, the best approach is likely to be to use a digital marketplace. MOSS gets you a single return and a single payment; unlike using a marketplace or a distributor, it does not free you from the need to comply with up to 28 different sets of VAT rules, though it makes doing so considerably simpler in a number of ways.

As regards determining whether your sale is in the EU or not, with very few exceptions (mostly having to do with e.g. mobile network operators, where there is an obvious way to tell where the customer is) you need to keep two non-contradictory pieces of information that identify your customer’s location. These might include, for instance

  • Your customer’s billing address
  • The result of IP geolocation
  • Your customer’s telephone number

If those two pieces of information say your customer is outside the EU, then it doesn’t matter (from your perspective) if the customer was really stood in the middle of Brussels at the time; the rules say that you have done what is expected of you.

The Bash Bug

There are lots of scary headlines on the Internet today about a bug in the GNU Project’s Bourne Again Shell (aka Bash).

Apparently, Bash allows subshells to inherit exported function definitions, which it implements by passing environment variables with those functions’ names through to subshells, with the value of the variable containing the function definition. For instance

outer$ function hello {
> echo "Hello World"
> }
outer$ export -f hello
outer$ PS1="inner$ " /bin/bash
inner$ hello
Hello World
inner$ exit
outer$ export -nf hello

In this case, the outer shell has exported the function hello to the inner shell, by setting an environment variable hello to the string () { echo "Hello World"; }. We can test this:

outer$ export hello='() { echo "Hello World"; }'
outer$ PS1="inner$ " /bin/bash
inner$ hello
Hello World
inner$ exit
outer$ export -n hello

On its own, this feature is only harmful if a user can specify the name and content of an environment variable, and only then if some program is foolishly trying to run commands without specifying their full path. For example:

outer$ ls='() { echo "No way, Jose"; }' PS1="inner$ " /bin/bash
inner$ ls
No way, Jose
inner$ /bin/ls
foo.txt    bar.txt
inner$ exit

However, current versions of Bash contain a bug that causes Bash to execute trailing statements on environment variables of this form, so for example

outer$ naughty='() { :;}; echo "Oh dear, oh dear"' PS1="inner$ " /bin/bash
Oh dear, oh dear
inner$ exit

In the above example, the inner shell runs the echo command. It shouldn’t.

Now, this is potentially a major security hole, but only in certain circumstances, namely:

  1. If a user can set the value of an environment variable, and

  2. Where a program passes control to a Bash shell and passes that value through.

The two most common cases that you might find that allow remote exploitation of this bug are CGI scripts (the old fashioned kind, not FastCGI, and not anything run via Apache’s mod_php, mod_perl or mod_python) and OpenSSH if you were relying on the ForceCommand feature to provide restricted SSH access. sudo, fortunately, already strips out Bash exported functions (and has done since 2004), so is not affected.

Put another way, unless you have very old code running on your web servers, and unless you are doing something like running a public SSH server that allows restricted log-ins (e.g. to run Git or Subversion via SSH, but nothing else), the chances are that you aren’t vulnerable to remote exploits based on this. You should check, but you should not panic.

Twitter Is Not Private Chat

Let me say that again: Twitter is not private chat.

Why do I say this? Well, because it seems there are people out there who confuse Twitter with services like Glassboard, and think that people they don’t know shouldn’t respond to their tweets. Or maybe it’s just people who disagree with them; it’s unclear.

There are a few important facts that such people need to be made aware of:

  1. People who follow them may retweet their tweets. As a result they may very well be seen by people who do not follow them, who they do not know and who might disagree with whatever opinion they’ve expressed.

  2. By default, your tweets are public. That being the case, tweeting is like standing on a soap box at Hyde Park Corner, talking loudly to all who will listen. You don’t get to pick your audience.

  3. If you say something on Twitter (or indeed from a soap box at Hyde Park Corner), and someone who sees your tweet (or is listening to you) finds it interesting or controversial, they have every right to reply. Your “conversation” is not private in any way, shape or form; indeed, it is not actually a conversation.

If you don’t like the above facts, Twitter has a mode for you; set your account to “protected” tweet mode. At that point, you do get to screen your followers, who can’t retweet you.

Yes, there are downsides to protected tweet mode. If you don’t like the way Twitter works, and you don’t want to protect your tweets, post to a blog instead and turn comments off. Or use a private group chat system like Glassboard. Alternatively, you will simply have to live with it.

Finally, if you ask on Twitter why people are replying to you when you don’t want them to, and someone points out all of the above, there is absolutely no excuse for threatening or abusing them.