| Main |

August 30, 2006

Make the Internet an independent state

Apparently, we're going to have legislation to ban violent pornography. I think that's probably a good thing, but I have my reservations about enacting laws that, according to the BBC News article “would apply to websites wherever they were based in the world”, because I happen to think that an unenforceable law is worse than no law at all. (That is, it matters little to some of the people operating illegal or questionable websites whether there is or isn't a law prohibiting it in the United Kingdom, and having such a law just provides them with a kind of perverse amusement in demonstrating how easy it is for them to break it without consequence.)

I think what we really need is to declare cyberspace independent territory and give it its own legal code. Then we can have an Internet Police Force, that will actually be able to enforce the law, rather than the current situation (of little or no effective law enforcement) because the Internet spans international boundaries and is presently covered (or sometimes, not covered) by a variety of conflicting national legislation. Of course, that would require the existing nation states to cede some of their current authority to the Internet's new government, including allowing things like IPF officers the power of arrest anywhere on the globe. Still, I think it's the best way forward.

August 29, 2006

N.Y. Times irresponsible? Who'd have thought?

I'm upset with the New York Times. Why? Because the information they published today might prejudice the trials of the terror suspects involved with the recent plot to blow-up airliners heading to the United States.

I couldn't care less about the fact that they've used a pseudo-geographic test to stop people in the U.K. from straightforwardly accessing their article. The fact is that such tests are not 100% reliable; for instance, go into any Starbucks coffee shop and you can get connected via T-Mobile, via Germany. Thus anyone in the U.K. who visits Starbucks with a WiFi-equipped device cannot now be a jurer in any of the trials, or worse, if they are picked and don't rule themselves out, some of these people could walk free.

Nor is that the only way to end up using an IP address allocated to a location outside of the U.K.. There are plenty of others—for instance, if you surf the 'Net using an anonymous browsing service, or if the database that the New York Times are using is out of date or incorrect. There are also a few idiots who have posted copies of it in places that aren't geographically censored.

So congratulations, New York Times.

August 25, 2006

Process Priorities

A pet hate of mine is end users (journalists especially) or—even worse—developers who don't understand how process priorities are supposed to work. Often, both in the press and elsewhere, people write that increasing the process priority makes the program in question run faster. This is a fallacy.

What process priority controls is which higher priority tasks can interrupt any given task to use the processor. If you give a process a low priority, and then do nothing else with the machine, it will still take the same amount of time to complete. Process priority only has an effect when you are doing something else with the machine at the same time.

The right way to use process priorities is shown in the following table:

Type of taskExampleAppropriate Priority
Safety-critical real-time Controlling chemical plant Highest
Fly-by-wire/drive-by-wire control
Monitoring Nuclear reactor temperature
Real-time Playing audio or video files High
User interaction
Video game rendering
Normal Text editing Normal
Web browser page rendering
Checking for email
CPU intensive Raytracing/rendering Low
Scientific simulation
Manipulating large data sets

Whilst the table above is very general, the point is that it makes it clear that user interaction (for instance) should take precedence over processing. If I, as an end-user, wish to wordprocess on my machine whilst I wait for a 3D rendering to complete, I should be able to do so. If, on the other hand, I want the rendering to finish as quickly as possible, I can leave my machine alone. Inappropriate priority settings (such as 3D rendering at high priority) make the machine unusable because the interactive response time becomes unacceptable. They don't make the rendering run any faster!

August 24, 2006

Dirk returns

Did you know that Google is related to evil (admittedly via an anagram of Elvis)? Dirk does.

August 21, 2006

“Security Researchers”

John Gruber's latest blog entry made me laugh. In the midst of it, he writes: “enormously irresponsible for ostensibly professional security researchers”.

It seems to me that many “security researchers” are neither (a) professional nor (b) responsible. Last time I looked they were much more interested in publicity than in protecting the computer using public… there are even recent stories in the media about them making thinly veiled threats to software vendors unless they are involved in the bug fixing process.

And precisely what is their business model anyway? I mean, who exactly pays them to search for flaws? I'll tell you who isn't paying them… Apple. They have plenty of very clever engineers who quite clearly already understand how these things work. So why would they want to pay these people, especially as they seem more concerned with publicising the flaws they find than they are with the security of the user base? Let's not forget that many security vulnerabilities only become vulnerabilities once published. Quite a few of the recent PC viruses only appeared after the flaws they exploit were published on CERT or Bugtraq. It isn't exactly rocket science:– if you don't know how to break into a system, you have to find out first, which takes time and usually requires access to a similar system (since repeated crashes of the target system, which commonly happen during the “research” stage of a hack, are a bit obvious to most people). If, on the other hand, someone publishes all of the details, some section of the user base won't be up-to-date—even if the vendor has been given time to release a fix—and it'll be much easier to break in.

As far as the whole Mac wireless vulnerability thing, there certainly have been vulnerabilities there in the past (e.g. via LDAP, which AFAIK used to be enabled by default), but only time will tell whether this one is real or not. I don't think that Brian Krebs and George Ou really helped matters, what with Krebs' original post on the subject, the title of which was ill-chosen, and Ou's remarks about a so-called “vicious orchestrated assault” on the originators of the claims.

The fact is that the researchers clearly intended to upset the Mac using community—remarks such as Maynor's “it eventually makes you want to stab one of those users in the eye with a lit cigarette or something” are pretty convincing evidence of that. If I were Krebs, I wouldn't have published that statement, and as far as George Ou's article goes, I have to say that someone who says things like Maynor did should expect a certain amount of flak as a result. Not that I'm condoning any of the things that apparently have been said to him… but let's get this into perspective: Maynor clearly intended to upset some elements of the Mac community. He's done so. If he's surprised how upset they got, perhaps he (and Ou) should reflect on the fact that journalists and security researchers have been telling Mac users for some time now how naïve and vulnerable they are, usually with undertones of “You're too smug. Just you wait, we'll prove you all wrong”. Is it really a surprise that people who openly espouse or support this agenda get jumped on?

Who is Alastair?

Well indeed. Who is Alastair?

I must say that I'm quite pleased to be sharing a name with some of these people, particularly Alastair Reynolds (I'm a big fan of his work; hard sci-fi is very much my kind of thing :-)).

August 13, 2006

Airport Security

Am I the only one who thinks that the current airport security measures here in the U.K. are just plain stupid?

Sure, terrorists won't be able to get on aeroplanes to detonate their bombs, but by clogging everything up, we've created huge crowds of people in the airport buildings themselves, outside the security checks. I can't think of a better target for a terrorist; they'll do many times more damage if they detonate a bomb in those crowds than they would be able to do if they blew up a plane. Well, with a big enough bomb, anyway.

I'm not saying that we shouldn't be careful, but I think we should be wary of doing things that simply move the risk from the air to the ground, which is—I think—all we've actually achieved here.

August 12, 2006

Character encoding and MovableType

I'm beginning to suspect that MovableType has a bug in it relating to character encoding, though the “smart” behaviour of some web browsers makes it very difficult to tell where the problem actually lies.

What I'm seeing is that the characters are encoded incorrectly on the index pages, but not on the articles' individual pages. At least, I think that's what I'm seeing.

August 11, 2006

Permalinks with dashes

Hopefully, permalinks should now use dashes rather than underscores too.

New look (take 2)

Once I'd worked out that a backup script had managed to overwrite some of the new MovableType files with copies of the previous version, things seemed to sort themselves out :-)

The new site design has some interesting features, in particular:

  • Live comment preview.
  • Live search.

Both of these are implemented using Bob Ippolito’s MochiKit, which I'm now a big fan of; it makes writing Javascript a lot less about tediously hitting the differences between the various implementations and a lot more about getting on with what it was you wanted to do.

New look!

I've finally updated the site templates for my blog, and also updated to the latest MovableType at the same time.

There seems to be a character encoding issue somewhere or other, and I'm having trouble with comments too :-(

August 10, 2006

Named groups and conditionals in ICU regexps

Since my original post on this topic, I've done a bit more work on the ICU regexp engine. It now supports named groups, and conditional expressions, including support for group numbers and names as well as lookahead and lookbehind expressions as the conditions on which to branch.

The syntax supported by this patch now includes:

(?P<name>)
Named capture group. Can be accessed via overloaded group(), start() and end() methods from C++, or using the uregex_groupIndexFromName() function from C.
(?P=name)
Named backreference.
(?(name-or-id)then-part|else-part)
Conditional expression; if the group identified by the numeric ID or name has been matched, then attempt a match against then-part, otherwise against else-part. The else-part is optional, in which case the '|' should also be omitted.
(?(?expr)then-part|else-part)
Conditional expression using a lookahead or lookbehind expression in place of ?expr. In this case, a match is attempted with the lookahead or lookbehind expression, and the result (true or false) used to choose whether to execute the then-part or the else-part as appropriate.

This version of the patch also fixes a couple of bugs in the previous version, simplifies the implementation of named capture groups (and at the same time disallows multiple capture groups with the same name), and adds quite a few new tests to the intltest and cintltest test suites to verify the operation of the new code.

It also adds the C functions uregex_namedGroupName() and uregex_namedGroupCount(), and the C++ method getGroupNames(), which provide a means for C and C++ code to obtain the names of the named capture groups for a particular RegexPattern object.

August 7, 2006

Mac Pro

Mmmmm… dual-core 3GHz Xeons, 1.6-2.1 times as fast as a PowerMac G5 Quad.

(MacNN are running a live feed from WWDC.)

August 6, 2006

Named groups in ICU regular expressions

Python is a pretty neat language, and one of the best things about it is its runtime library. The only real downside is that Python itself isn't really very fast, but of course you can work around this by making good use of the runtime library (much of which is implemented in C), or—if all else fails—by writing your own extension module. (This is hardly a new situation; in the 1980s many of us used to write high-performance code in assembly language so we could use it from our BASIC or even C programs; some versions of BASIC even had integrated support for assembly language… this was true, for instance of BBC BASIC on the Acorn machines, and GFA BASIC on the Atari platform.)

Anyway, one of the ways that Python programmers, like Perl and Ruby programmers, optimise their code is to make heavy use of regular expressions. The regular expression engines in these languages are fast, and it's often much quicker to match a single pre-compiled regexp than it is to write Python code to scan a string. That in mind, Python has a feature called “named capture groups”; the Python library documentation says:

(?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named 'id' in the example above can also be referenced as the numbered group 1.

For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in pattern text (for example, (?P=id)) and replacement text (such as \g<id>).

(?P=name)

Matches whatever text was matched by the earlier group named name.

This is a great feature, and makes it much easier to write complicated regular expressions, since you can explicitly name the groups you want to extract and then you don't have to worry about whether or not the indices are all going to change when you add that one extra capture group that you need for whatever-it-is that you're doing.

However, and here's the problem, it's non-standard. Python supports it, but Ruby and Perl don't support it, and, as the regular expression syntax for the ICU library was derived from Perl syntax, that doesn't either.

Anyway, to cut a long story short, I want to use ICU with Python, and part of that means that I'd like regular expressions to work consistently (what I don't want is to find that my regexp matches using one or other API, but then because e.g. the Unicode character databases differ, things then break). So I've written a patch for ICU 3.4 that adds support for Python-syntax named capture groups.

Interestingly, the regexp library used by Ruby, Oniguruma, does support named groups, though with a different syntax to Python. I could have implemented that also (indeed, I think, with my patch, it can be done by just changing regexcst.txt and regenerating the associated header file), however the Oniguruma syntax looks like the kind of thing that might clash with features in future versions of Perl (it uses (?<name>...) and \k<name>, whereas Python put all of its additions inside (?P...) to avoid clashes).

Update: a new version of this patch is available.