Alastair’s Place

Software development, Cocoa, Objective-C, life. Stuff like that.

Apple Help in 2015

The last time I had to build a brand new help file was some time ago — maybe even ten years ago — and in the world of software, that’s an age.

For the past few months I’ve been working hard on a new release of iDefrag, version 5, and as part of this I’m rewriting the documentation. Rather than using hand-written HTML like I did before, I’ve chosen this time around to use a documentation generator, Sphinx. The advantages of this approach include:

  • Built-in support for indexing and cross-referencing.

  • The ability to write the documententation in plain text.

  • Keeps the presentation details separate from the content (via theming and templates).

  • Supports multiple output formats, not just HTML.

The current version of Sphinx doesn’t directly support building Apple Help Books, but I’ve submitted a pull request to fix that so hopefully by the time you read this you’ll be able to do

$ sphinx-quickstart

fill in some fields and then do

$ make applehelp

to generate a help book.

(If you do do that, you’ll want to edit your file quite a bit, and you probably don’t want to use the default theme either.)

Anyway, all of the Sphinx related stuff was fine, and worked as documented. Unlike Apple Help, which doesn’t. I spent an entire day struggling to make a help book that actually worked, and most of that is because of problems with the documentation.

Let’s start with the Info.plist. Apple gives this not particularly helpful table:

Key Exact or sample value
CFBundleDevelopmentRegion en_us
CFBundleInfoDictionaryVersion 6.0
CFBundleName SurfWriter
CFBundlePackageType BNDL
CFBundleShortVersionString 1
CFBundleSignature hbwr
CFBundleVersion 1
HPDBookAccessPath SurfWriter.html
HPDBookIconPath shrd/SurfIcn.png
HPDBookIndexPath SurfWriter.helpindex
HPDBookKBProduct surfwriter1
HPDBookTitle SurfWriter Help
HPDBookType 3
HPDBookTopicListCSSPath sty/topiclist.css
HPDBookTopicListTemplatePath sty/topiclist.xquery

There are two serious problems with the table above. The first is that some of it is wrong(!), and the second is that it doesn’t indicate which values are sample values and which are required.

Here’s what you actually need:

Key Value
CFBundleDevelopmentRegion en-us
CFBundleIdentifier your help bundle identifier
CFBundleInfoDictionaryVersion 6.0
CFBundlePackageType BNDL
CFBundleShortVersionString your short version string - e.g. 1.2.3 (108)
CFBundleSignature hbwr
CFBundleVersion your version - e.g. 108
HPDBookAccessPath _access.html (see below)
HPDBookIndexPath the name of your help index file
HPDBookTitle the title of your help file
HPDBookType 3

The first thing to note is that CFBundleDevelopmentRegion should have a hyphen, not an underscore. Apple’s utilities generate this properly, but the documentation is wrong.

The second thing to note is that in spite of the documentation implying that you can use your help bundle identifier to refer to your help bundle (which would, admittedly, make sense), you can’t. You need to use the HPDBookTitle value. Oh, and ignore any references to AppleTitle meta tags. You don’t need those.

The third thing relates to HPDBookAccessPath. The file referred to there must be a valid XHTML file. In particular, it cannot be an HTML5 document — that will simply not work, and the error messages you get on the system console are completely uninformative.

The best solution I’ve come up with for this particular problem, as I want to generate modern HTML output, is to make a file called _access.html and put the following in it:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "">
<html xmlns="">
    <title>Title Goes Here</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="robots" content="noindex" />
    <meta http-equiv="refresh" content="0;url=index.html" />

This means that both helpd and the help indexer (hiutil) are happy, and I can write my index page using modern HTML. Incidentally, Apple appears to be using a similar trick in the help for the current version of Mail. Obviously you can change the index.html in the above to whatever you need.

In your application bundle, you need to fill in the following keys

Key Value
CFBundleHelpBookFolder The path of your help book relative to Resources - e.g.
CFBundleHelpBookName The value from HPDBookTitle, above

Note that while the HPDBookTitle is displayed to the user, it can be localised using InfoPlist.strings. Note also that you absolutely cannot, contrary to what the documentation implies, give a bundle ID here. It just doesn’t work. You could however, if you wanted, write an InfoPlist.strings file like this:

HPDBookTitle = "SurfWriter Help"

then put the bundle ID in as the HPDBookTitle in the Info.plist.

Oh, and if you think you’re going to be able to double-click a help book to preview it, think again. That won’t work. Instead, you need either to use it from within your application, or you can put it in ~/Library/Documentation/Help (you might have to make that folder) and double-click it in there. Why? Because help files are indexed and you can only open them if they’re registered in the index.

One other thing that isn’t really documented at all is what exactly the HPDBookRemoteURL will do for you. There’s some handwaving about being able to offer remote content updates, but how the URL is used is skirted over. Well, if you do set HPDBookRemoteURL, Help Viewer will essentially expect it to point at a copy of the Resources folder of your bundle; so if you have HPDBookRemoteURL set to, then you’re going to get requests like (and so on).

January VAT Changes and the VAT Threshold

I’ve just spotted this petition, via a retweet from Dan Counsell, and as a member of HMRC’s Joint SME MOSS Working Group as well as the owner of a microbusiness I thought I’d make a couple of comments.

It isn’t particularly clear from the petition, but the problem being raised is that in order to register for the Mini One Stop Shop in the UK, you currently need to be registered for UK VAT. This is something that we have been talking to HMRC about, and I have the impression that HMRC is amenable, in principle, to allowing non-VAT-registered entities to use the Mini One Stop Shop system, though the details of that have not been worked out.

Note also that your sales here in the UK will continue to be subject to ordinary UK VAT, and will not be reported through MOSS, and even if your UK-only sales are below the UK VAT threshold, it’s likely that you have expenditure in the UK that involves an element of VAT, so you might want to consider a voluntary registration in any event, in order to reclaim your input tax.

(There is a related issue within the Mini One Stop Shop itself, in that there are no thresholds for amounts reported via MOSS. HMRC did try to negotiate a threshold, but other member states didn’t support the idea and it was dropped.)

It is also worth pointing out that the Mini One Stop Shop is optional. You don’t have to use it. The alternatives are:

  • Use a digital “marketplace” (e.g. Apple’s App Store, Google Play, Paddle). Marketplace operators, as of the 1st of January 2015, are required by law to deal with EU VAT for you. You will only need to deal with B2B transactions between you and the store operator.

  • Register for VAT in EU member states into which you are selling. This will mean filing multiple VAT returns and complying fully with (up to) 28 different sets of VAT legislation.

  • Use a distributor in EU member states you wish to sell into. The distributor is a business, so you only need worry about a B2B sale; B2C sales will be made by the distributor within the member state(s) in which it operates.

  • Stop selling to other EU member states.

For a lot of digital micro-businesses, the best approach is likely to be to use a digital marketplace. MOSS gets you a single return and a single payment; unlike using a marketplace or a distributor, it does not free you from the need to comply with up to 28 different sets of VAT rules, though it makes doing so considerably simpler in a number of ways.

As regards determining whether your sale is in the EU or not, with very few exceptions (mostly having to do with e.g. mobile network operators, where there is an obvious way to tell where the customer is) you need to keep two non-contradictory pieces of information that identify your customer’s location. These might include, for instance

  • Your customer’s billing address
  • The result of IP geolocation
  • Your customer’s telephone number

If those two pieces of information say your customer is outside the EU, then it doesn’t matter (from your perspective) if the customer was really stood in the middle of Brussels at the time; the rules say that you have done what is expected of you.

The Bash Bug

There are lots of scary headlines on the Internet today about a bug in the GNU Project’s Bourne Again Shell (aka Bash).

Apparently, Bash allows subshells to inherit exported function definitions, which it implements by passing environment variables with those functions’ names through to subshells, with the value of the variable containing the function definition. For instance

outer$ function hello {
> echo "Hello World"
> }
outer$ export -f hello
outer$ PS1="inner$ " /bin/bash
inner$ hello
Hello World
inner$ exit
outer$ export -nf hello

In this case, the outer shell has exported the function hello to the inner shell, by setting an environment variable hello to the string () { echo "Hello World"; }. We can test this:

outer$ export hello='() { echo "Hello World"; }'
outer$ PS1="inner$ " /bin/bash
inner$ hello
Hello World
inner$ exit
outer$ export -n hello

On its own, this feature is only harmful if a user can specify the name and content of an environment variable, and only then if some program is foolishly trying to run commands without specifying their full path. For example:

outer$ ls='() { echo "No way, Jose"; }' PS1="inner$ " /bin/bash
inner$ ls
No way, Jose
inner$ /bin/ls
foo.txt    bar.txt
inner$ exit

However, current versions of Bash contain a bug that causes Bash to execute trailing statements on environment variables of this form, so for example

outer$ naughty='() { :;}; echo "Oh dear, oh dear"' PS1="inner$ " /bin/bash
Oh dear, oh dear
inner$ exit

In the above example, the inner shell runs the echo command. It shouldn’t.

Now, this is potentially a major security hole, but only in certain circumstances, namely:

  1. If a user can set the value of an environment variable, and

  2. Where a program passes control to a Bash shell and passes that value through.

The two most common cases that you might find that allow remote exploitation of this bug are CGI scripts (the old fashioned kind, not FastCGI, and not anything run via Apache’s mod_php, mod_perl or mod_python) and OpenSSH if you were relying on the ForceCommand feature to provide restricted SSH access. sudo, fortunately, already strips out Bash exported functions (and has done since 2004), so is not affected.

Put another way, unless you have very old code running on your web servers, and unless you are doing something like running a public SSH server that allows restricted log-ins (e.g. to run Git or Subversion via SSH, but nothing else), the chances are that you aren’t vulnerable to remote exploits based on this. You should check, but you should not panic.

Twitter Is Not Private Chat

Let me say that again: Twitter is not private chat.

Why do I say this? Well, because it seems there are people out there who confuse Twitter with services like Glassboard, and think that people they don’t know shouldn’t respond to their tweets. Or maybe it’s just people who disagree with them; it’s unclear.

There are a few important facts that such people need to be made aware of:

  1. People who follow them may retweet their tweets. As a result they may very well be seen by people who do not follow them, who they do not know and who might disagree with whatever opinion they’ve expressed.

  2. By default, your tweets are public. That being the case, tweeting is like standing on a soap box at Hyde Park Corner, talking loudly to all who will listen. You don’t get to pick your audience.

  3. If you say something on Twitter (or indeed from a soap box at Hyde Park Corner), and someone who sees your tweet (or is listening to you) finds it interesting or controversial, they have every right to reply. Your “conversation” is not private in any way, shape or form; indeed, it is not actually a conversation.

If you don’t like the above facts, Twitter has a mode for you; set your account to “protected” tweet mode. At that point, you do get to screen your followers, who can’t retweet you.

Yes, there are downsides to protected tweet mode. If you don’t like the way Twitter works, and you don’t want to protect your tweets, post to a blog instead and turn comments off. Or use a private group chat system like Glassboard. Alternatively, you will simply have to live with it.

Finally, if you ask on Twitter why people are replying to you when you don’t want them to, and someone points out all of the above, there is absolutely no excuse for threatening or abusing them.

Code-points Are a Red Herring

Having just read Matt Galloway’s article about Swift from an Objective-C developer’s perspective, I have a few things to say, but the most important of them is really nothing to do with Swift, but rather has to do with a common misunderstanding.

Let me summarise my conclusion first, and then explain why I came to it a long time ago, and why it’s relevant to Swift.

If you are using Unicode strings, they should (look like) they are encoded in UTF-16.

“But code-points!”, I hear you cry.

Sure. If you use UTF-16, you can’t straightforwardly index into the string on a code-point basis. But why would you want to do that? The only justification I’ve ever heard is based around the notion that code-points somehow correspond to characters in a useful way. Which they don’t.

Now, someone is going to object that UTF-16 means that all their English language strings are twice as large as they need to be. But if you do what Apple did in Core Foundation and allow strings to be represented in ASCII (or more particularly in ISO Latin-1 or any subset thereof), converting to UTF-16 on the fly at the API level is trivial.

What about UTF-8? Why not use that? Well, if you stick to ASCII, UTF-8 is compact. If you include ISO Latin-1, UTF-8 is never larger than UTF-16. The problem comes with code-points that are inside the BMP, but have code-point values of 0x800 and above. Those code-points take three bytes to encode in UTF-8, but only two in UTF-16. For the most part this affects Oriental and Indic languages, though Eastern European languages and Greek are affected to some degree, as is mathematics and various shape and dingbat characters.

So, first off, UTF-8 is not necessarily any smaller than UTF-16.

Second, and this is an important one too, UTF-8 permits a variety of invalid encodings that can create security holes or cause other problems if not dealt with. For instance, you can encode NUL (code-point 0) in any of the following ways:

c0 80
e0 80 80
f0 80 80 80

Some older decoders may also accept

f8 80 80 80 80
fc 80 80 80 80 80

Officially, only the first encoding (00) is valid, but you as a developer need to check for and reject the other encodings. Additionally, any encoding of the code-points d800 through dfff is invalid and should be rejected — a lot of software fails to spot these and lets them through.

Finally, if you start in the middle of a UTF-8 string, you may need to move a variable number of bytes to find the character you’re in, and you can’t tell in advance how many that will be.

For UTF-16, the story is much simpler. Once you’ve settled on the byte order, you really only need to watch out for broken surrogate pairs (i.e. use of d800 through dfff that doesn’t comply with the rules). Otherwise, you’re in pretty much the same boat as you would be if you’d picked UCS-4, except that in the majority of cases you’re using 2 bytes per code-point, and at most you’re using 4, so you never use more than UCS-4 would to encode the same string.

If you have a pointer into a UTF-16 string, you may at most need to move one code unit back, and that only happens if the code unit you’re looking at is between dc00 and dfff. That’s a much simpler rule than the one for UTF-8.

I can hear someone at the back still going “but code-points…”. So let’s compare code-points with what the end user things of as characters and see how we get on, shall we?

Let’s start with some easy cases:

0 - U+0030
A - U+0041
e - U+0065

OK, they’re straightforward. How about

é - U+00E9

Seems OK, doesn’t it? But it could also be encoded

é - U+0065 U+0301

Someone is now muttering about how “you could deal with that with normalisation”. And they’re right. But you can’t deal with this with normalisation:

ē̦ - U+0065 U+0304 U+0326

because there isn’t a precomposed variant of that character.

“Yeah”, you say, “but nobody would ever need that”. Really? It’s a valid encoding, and someone somewhere probably would like to be able to use it. Nevertheless, to deal with that objection, consider this:

בְּ - U+05D1 U+05B0 U+05BC

That character is in use in Hebrew. And there are other examples, too:

कू - U+0915 U+0942
कष - U+0915 U+0937

The latter case is especially interesting, because whether you see a single glyph or two depends on the font and on the text renderer that your browser is using(!)

The fact is that code-points don’t buy you much. The end user is going to expect all of these examples to count as a single “character” (except, possibly for the last one, depending on how it’s displayed to them on screen). They are not interested in the underlying representation you have to deal with, and they will not accept that you have any right to define the meaning of the word “character” to mean “Unicode code-point”. The latter simply does not mean anything to a normal person.

Now, sadly, the word “character” has been misused so widely that the Unicode consortium came up with a new name for the-thing-that-end-users-might-regard-as-a-unit-of-text. They call these things grapheme clusters, and in general they consist of a sequence of code-points of essentially arbitrary length.

Note that the reason people think using code-points will help them is that they are under the impression that a code-point maps one-to-one with some kind of “character”. It does not. As a result, you already have to deal with the fact that one “character” does not take up one code unit, even if you chose to use the Unicode code-point itself as your code unit. So you might as well use UTF-16: it’s no more complicated for you to implement, and it’s never larger than UCS-4.

It’s worth pointing out at this point that this is the exact choice that the developers of ICU (the Unicode reference implementation) and Java (whose string implementation derives from the same place) made. It’s also the choice that was made in Objective-C and Core Foundation. And it’s the right choice. UTF-8 is more complicated to process and is not, actually, smaller for many languages. If you want compatibility with ASCII, you can always allow some strings to be Latin-1 underneath and expand them to UTF-16 on the fly. UCS-4 is always larger and actually no easier to process because of combining character sequences and other non-spacing code-points.

Why is this relevant to Swift? Because in Matt Galloway’s article, it says:

Another nugget of good news is there is now a builtin way to calculate the true length of a string.

Only what Matt Galloway means by this is that it can calculate the number of code-points, which is a figure that is almost completely useless for any practical purpose I can think of. The only time you might care about that is if you were converting to UCS-4 and wanted to allocate a buffer of the correct size.

Async in Swift

You may have seen this piece I wrote about implementing something like C#’s async/await in Swift. While that code did work, it suffers from a couple of problems relative to what’s available in C#. The first problem is that it only supports a single return type, Int, because of a problem with the current version of the Swift compiler.

The second problem is that you can’t use it from the main thread in a Cocoa or Cocoa Touch program, because await blocks.

As I mentioned previously on Twitter, to make it work really well involves some shennanigans with the stack. Anyway, I’m pleased to announce that I’ve been merrily hacking away and as a result you can download a small framework project that implements async/await from BitBucket.

I’m quite pleased with the syntax I’ve managed to construct for this as well; it looks almost as if it’s a native language feature:

let task = async { () -> () in
  let fetch = async { (t: Task<NSData>) -> NSData in
    let req = NSURLRequest(URL: NSURL.URLWithString(""))
    let queue = NSOperationQueue.mainQueue()
    var data = NSData!
      completionHandler:{ (r: NSURLResponse!, d: NSData!, error: NSError!) -> Void in
        data = d
    return data!

  let data = await(fetch)
  let str = NSString(bytes: data.bytes, length: data.length,
                     encoding: NSUTF8StringEncoding)


Now, to date I haven’t actually tried it on iOS; I think it should work, but it’s possible that it will crash horribly. It is certainly working on OS X, though.

How does it work? Well, behind the scenes, when you use the async function, a new (very small) stack is created for your code to run in. The C code then uses _setjmp() and _longjmp() to switch between different contexts when necessary. If you want to cringe slightly now, be my guest :–)

Possible improvements when I get the time:

  • Reduce the cost of async invocation by caching async context stacks
  • Once Swift is fixed, remove the T[] hack that we’re using instead of declaring the result type in the Task<T> object as T?. The latter presently doesn’t work because of a compiler limitation.

C#-like Async in Swift

Justin Williams was wishing for C#-like async support in Swift. I think it’s possible to come up with a fairly straightforward implementation in Swift, without any changes to the compiler, and actually without any hacking either. (If it weren’t for compiler bugs, the code below would be more than just a toy implementation too…)

Anyway, here goes:

import Dispatch

var async_q : dispatch_queue_t = dispatch_queue_create("Async queue",

/* If generics worked, we'd use Task<T> here and result would be of type T? */
class Task {
  var result : Int?
  var sem : dispatch_semaphore_t = dispatch_semaphore_create(0)
  func await() -> Int {
    dispatch_semaphore_wait(sem, DISPATCH_TIME_FOREVER)
    return result!

func await(task: Task) -> Int {
  return task.await()

func async(b: () -> Int) -> Task {
  var r = Task()
  dispatch_async(async_q, {
    r.result = b()
  return r

/* Now use it */
func Test2(var a : Int) -> Task { return async {
  return a * 7

func Test(var a : Int) -> Task { return async {
  var t2 = Test2(a)
  var b = await(t2)
  return a + b

var t = Test(100)

println("Waiting for result")

for n in 1..10 {
  println("I can do work here while the function works.")

var result = await(t)

println("Result is available")

Now, obviously if Swift supported continuations, this might be done more efficiently (i.e. without any background threads or semaphores), but that’s an implementation detail.

There are also some syntax changes that would make it cleaner, notably if it was permissible to remove the { return and } from the async function declarations. I did briefly try to see whether I was allowed to assign to a function, ala

func Test(var a : Int) -> Task = async { }

but that syntax isn’t allowed (if it was, async would obviously need to return a block).

1st January 2015 VAT Changes

On the 1st of January 2015, some changes to European Union law come into force that significantly affect the way that VAT works for “electronic services” delivered to consumers. The laws in question were actually changed back in 2008, but because of obstruction from some member states that benefit from the status quo, the date at which they came into effect was pushed back by six years.

If you are a software developer selling software in the European Union, these changes matter to you. There has been very little publicity thus far about these changes (that will change as we get closer to the end of the year), but given that you may need to make changes to your website, it seems like a good idea to tell you about them now.

So, what’s changing? Currently, if you are established in the European Union and you sell downloadable software to a customer who is also in the European Union, you always charge VAT in your country, following the rules in your country, and you pay it to the tax authority in your country. This is simple, because there is only one set of rules to follow, and it’s the one for your country.

As of the 1st of January, the VAT will instead be due in the customer’s country. If there were no other changes to the rules, you would therefore be obliged to register for VAT in other member states, according to their rules, and submit multiple returns every quarter (or at whatever period they specify). That means you might have to register with up to 28 member states, apply 28 different rates, 28 different sets of rules, make 28 times as many VAT returns and 28 separate payments in difference currencies (with currency conversions and rounding following different rules in different jurisdictions). For a small software company or an independent developer, this is clearly not going to work.

There are two other changes that are also coming in at the same time that mitigate this problem. The first is that app stores will be responsible for charging and remitting consumer VAT. Apple already does this, but some other app stores may not. Under the new rules, they will have to, so you will only have to deal with VAT as it applies to transactions between you and the app store provider.

If you sell direct to consumers, that doesn’t really help, though. What will help is that EU member states are going to operate a system known as the Mini One Stop Shop (or MOSS for short). This is similar to the scheme that has been operating for businesses outside of the EU selling to EU customers, whereby you can register with a single tax authority, submit a single return to that tax authority, and pay all of the tax due to that one place. You are still required to charge VAT at the rate applicable in the customer’s country, and in various respects the rules in that country will still apply — with some simplifications. Registration for this new scheme starts in October, and, unless you plan on only selling via an app store, you will probably want to register for it.

The other slight complication is that after 1st of January, you will need to keep two non-conflicting pieces of evidence to identify the location of your customer. HMRC has indicated, at least in the case of the U.K., that they will be fairly relaxed about this evidence — so, for instance, they realise that IP geolocation may not be 100% accurate, and that some customers may lie and give you false details. It also does not matter if you have more data that conflicts with your two non-conflicting pieces of evidence; all you need is those two. However, this affects all of your sales, not just those to customers in the EU, since it applies equally to your decision not to charge VAT to customers because they are not in any EU member state.

Why am I telling you about this? Because I’m a member of H.M. Revenue and Customs’ MOSS Joint SME Business/HMRC Working Group. Those of you who are in the UK, if you have queries about the scheme, or issues you would like to raise with HMRC, please do get in touch and I’ll try to help out. (If you are a member of TIGA, they have a couple of representatives on the working group also, so you can talk to them too.)

Finally, I will add that the law changes are already made — back in 2008 — so the scope for changing the rules at this stage is very limited. What we can influence to some extent is how they’re enforced and whether HMRC is aware of problems the new rules may cause us.

I’ll be posting some more on this topic over the coming weeks and months.

Dmgbuild - Build ‘.dmg’ Files From the Command Line

I’ve just released a new command line tool, dmgbuild, that automates the creation of (nice looking) disk images from the command line. There are no GUI tools necessary; there is no AppleScript, and it doesn’t rely on Finder, or on any deprecated APIs.

Why use this approach? Well, because everything about your disk image is defined in a plain text file, you’ll get the same results every time; not only that, but the resulting image will be the same no matter what version of Mac OS X you build it on.

If you’re interested, the Python package is up on PyPI, so you can just do

pip install dmgbuild

to get the program (if you don’t have pip, do easy_install pip first; or download it from PyPI, extract it, then run python install). You can also read the documentation, or see the code.

It’s really easy to use; all you need do is make a settings file (see the documentation for an example) then from the command line enter something like

dmgbuild -s "My Disk Image" output.dmg

The code for editing .DS_Store files and for generating Mac aliases has been split out into two other modules, ds_store and mac_alias, for those who are interested in such things. The ds_store module should be fully portable to other platforms; the mac_alias module relies on some OS X specific functions to fill out a proper alias record, and on other systems those would need to be replaced somehow. The dmgbuild tool itself relies on hdiutil and SetFile, so will only work on Mac OS X.

Bit-rot and RAID

There’s an interesting article on Ars Technica about next-generation filesystems, which mentions something it calls “bit rot” — allegedly the “silent corruption of data on disk or tape”.

Is this a thing? Really? Well, no, not really.

Very early on, disks and tapes were relatively unreliable and so there have basically always been checksums of some description to let you know if data you read is corrupted. Historically, we’re talking about some kind of per-block cyclic redundancy check, which is why one of the error codes you can receive at a disk hardware interface is “CRC error”.

Modern disks actually use error correcting codes such as Reed-Solomon Encoding or Low-Density Parity Check codes. A single random bit error under such schemes can be corrected, end of story. They may be able to correct multiple bit errors too, and these codes can detect more errors than they are able to correct.

The upshot is that a single bit flip on a disk surface won’t cause a read error; in fact, the software in your computer won’t even notice it because the hard disk will correct it and rewrite the data on its own.

It takes multiple flipped bits to cause a problem, an in most cases this will result in the drive reporting a failure to the operating system when trying to read the block in question. The probability of a multi-bit failure that can get past Reed-Solomon or LDPC codes is tiny.

The author then goes on to make a ludicrous claim that RAID won’t be able to deal with this kind of event, and “demonstrates” by flipping “a single bit” on one of his disks to make his point. Unfortunately, this is a completely bogus test. He has, in fact, flipped at many more bits than just the one, and he’s done so by writing to the disk, which will encode his data using its error correcting code, resulting in a block that reads correctly because he’s actually stored the wrong data there deliberately.

The fact is that, in practice, when an unrecoverable data corruption occurs on a disk surface, the disk returns an error when something tries to read that block. If a RAID controller gets such an error, it will attempt to rebuild the data using parity (or whatever other redundancy mechanism it’s using).

So RAID really does protect you from changes that occur on the disk itself.

Where RAID does not protect you is on the computer side of the equation. It doesn’t prevent random bit flips in RAM, or in the logic inside your machine. Some components in some computers have their own built-in protection against these events — for instance, ECC memory uses error correcting codes to prevent random bit errors from corrupting data, while some data busses themselves use error correction. If you are seeing random bit flips in files that otherwise read OK, it’s much more likely they were introduced in the electronics or even via software bugs and written in their corrupted form to your storage device.

An aside: programmers generally use the term “bit rot” to refer to the fact that unmaintained code will often at some point stop working because of apparently unrelated changes in other parts of a large program. Such modules are said to be suffering from “bit rot”. I’ve never heard it used in the context of data storage before.