String parsing and strtol()


I have recently started using ii as my IRC client, having previously used this fork of sirc. I found ii lacking in some ways, and given that it is relatively straightforward C source code, I decided to patch it to add some useful features and fix some bugs (the full commit log is here).

One particular feature I added was user mode tracking, so that if a user has +v or higher then they have the appropriate prefix character prepended to their name, e.g.:

<foo> regular user
<+bar> user with voice
<%baz> user with halfops
<@quux> user with ops

This requires parsing a number of things. Firstly, one must parse 005 lines from the server at the beginning of the connection, where the server states its capabilities, which include supported channel modes, modes that users can have within channels and the prefix characters they correspond to, as different servers support different modes. (Hint: this document is really useful.) One must also parse user lists the server sends when one enters a channel, where users who have modes in the channel have the appropriate prefix characters prepended to their nicks. Finally, one must parse MODE lines whenever a channel mode changes, so that if someone's prefix changes then we can update our bookkeeping state. This last task turned out trickier than I expected.

The mode parsing code appeared to work fine... until I tried using ii on a different system. The perplexing thing was that the systems were both running the same operating system (Debian 9 in this case), yet when I had both clients connected to the same channel, the one on the new box would not process mode changes of users properly. This usually manifested as one of the operators on the channel pinging out, rejoining and having ChanServ give them ops again, after which their prefix character was not printed in the second client. I tweaked and recompiled ii and ran it under a very nice gdb script which prints a complete call graph in real time (found here) several times, and eventually the evidence pointed towards the problem being dependent on the C library I was compiling against. The first instance of the client (the one that functioned correctly) was compiled against glibc, and the second was compiled against musl libc.

At this point I realised that this was probably going to take a while and involve some amount of pain.

So, I compiled ii against both glibc and musl, and I ran each of them under gdb, set a breakpoint on the function which parses MODE lines and then stepped through it line by line. I found that the ii compiled against musl would return early after a strtok() invocation that returned NULL, while the glibc-compiled ii continued executing the function. This was strange to say the least, and I went off on a wild goose chase downloading the source code for musl and glibc's strtok functions and compiling each against the other library, but the fact remained that it was the C library that I was using that was the issue.

I was stumped by this. However, when running the gdb call graph script I mentioned above on ii while being stumped about this, I noticed that the arguments to the mode parsing function weren't what I expected. I rigged up ii to print more debugging information, and sure enough, the MODE lines from the server were being split differently by glibc and by musl. This involves ii's tokenize() function, which takes a line from the server which has been preprocessed slightly, and inserts NULL characters between tokens in the line, i.e. the command, the channel name, the arguments passed with the command, and any trailing text. I checked before and after this function was called, and it was definitely processing its arguments differently depending on the libc in use. The catch is that within that function there is only one function in libc that it calls: strtol().

tokenize() makes use of a function called isnumeric(), which is short enough that I'll reproduce it here:

static int
isnumeric(const char *s)
{
    errno = 0;
    strtol(s, NULL, 10);
    return errno == 0;
}

It attempts to convert its argument to a long, and returns success if errno is not set. Now go and read what POSIX has to say about strtol(). Note that the standard says that "These functions [strtol() and friends] may fail if: EINVAL - No conversion could be performed". Setting errno when the argument does not contain a numeric is an implementation-level decision - glibc and OpenBSD's libc don't do it; musl does. Knowing suckless, I suspect that ii was originally written with musl libc in mind, and thus ii relies on musl's behaviour. As Debian is my main workstation platform, I've only really tested my patches against glibc; the MODE parser was actually (and unknowingly) working around this bug.

I looked back through the git history of ii, and it used to call strtol() directly inside tokenize() and check its return value. This was changed to check whether the pointer passed to strtol() would cause a null pointer dereference, and the strtol() was moved into the wrapper function shown above, which checked the value of errno instead. My fix was to remove the wrapper function, and check strtol()'s success from the return value, as it had done previously (while maintaining the null pointer check). (And if strtol() did ever return 0 on a valid conversion, the line would be handled appropriately by the rest of the parsing machinery.) I had to adjust the mode parsing function too, and I was able to remove some heap allocations that the previous code had required. Bug fixed.

The morale of this story? Standards aren't.



home