I/O Errors and Bad Blocks - Alastair’s Place

It was quite interesting reading Jonathan Rentzsch’s post about I/O errors and the impact that they might have on ordinary users.

As my company makes disk utility software, we see quite a lot of these kinds of errors, which users often report to us as “bugs”, telling us that they’ve checked the disk with DiskWarrior and it’s fine. Of course, DiskWarrior doesn’t actually check for bad blocks, which is where their mistake lies, so we end up having to explain that the problem is that our product stopped because it found something it couldn’t read, and that this was caused by a problem with their hard disk.

Unfortunately for usâ€”though perhaps fortunately for most usersâ€”because of the way that modern disks “fix” them if you overwrite the bad area, together with the very large amounts of data held on disks that are infrequently accessed, the actual frequency is somewhat hidden and tends only to show up if you write applications like backup programs or defragmentation tools, both of which need to read a large proportion of the files on the disk.

Anyway, Jonathan, in his post, takes the hacker’s approach of trying to copy as much of the file as possible and for the bits that can’t be copied, ignore the errors and pretend the data was zero. This is fine for people like Jonathan, or like me, who understand about disks and computer programs and know the kinds of problems this might create. However, very few people in the general user population should really be trying to use tricks like overwriting blocks with zeroes or copying the file with options set to ignore the I/O error, since the consequences of those actions are not always wholly predictable.

To give an example, imagine that the document that has developed a bad block contains important financial information, and let’s also suppose that replacing the bad region with zeroes “fixes” it to the extent that the current version of whatever program wrote it is able to read it. Unfortunately, however, that does not mean that future versions will be able to read it, even if you use the “Save Asâ€¦” option to save an entirely new copy of the data. So in five years’ time, when the IRS or (if, like me, you’re in the U.K.) HMRC inquire about the information contained therein, you might not be able to read it!

And in case you don’t think that problem is real, I’ve seen similar problems in the past that were caused by bugs in earlier versions that sometimes resulted in subtly corrupted documents, which crashed later versions of the same program when you try to load them.

What should you do?

Keep adequate backups.

That way, when this happens (and it is, I am afraid, a question of when rather than if), you can simply restore the affected file from a recent backup.

If you don’t have a back-up of the file, leave this kind of futzing with files to expert users. If you don’t know any expert users, then either:

Get the file so it works, and export it in plain text form (e.g. CSV, or a simple text file). Keep the plain text copy, just in case.
Contact a local computer club, or (if you really must) a data recovery firm.

I've found this great program that can recover data from bad blocksâ€¦

Such programs are only marginally useful, and at worst are just snake-oil, particularly if they claim to be able to use statistics about individual bits to recover their original values (that just won’t work with modern disks because of the fact that they use PRML and a Viterbi decoder; it might have worked, once upon a time, with MFM or GCR disks, and maybe even to an extent with non-PRML RLL disks, but the theory doesn’t work for disks using PRML).

Backups are a much better way to keep your data safe.