Bloody stupid things to do when writing C libraries, #78

…having callback function arguments that do not take a corresponding invocation-specific data pointer.

You want to have a function that takes a function pointer, and have your library call that function at some point in the future if some event happens? Cool! Works for me. I like those. (Well, sorta, event/callback/async programming is a pain) However…. the signature should never be:

int register_callback(func_pointer_t callback);

Bad! Bad programmer! No cookie! That signature should be:

int register_callback(func_pointer_t callback, void *extra_data);

Or, if you’d rather, take a struct that has the function pointer and callback data in it, if you don’t want to manage the two pointers in your library. The signature for that callback pointer should be:

int callback_func(struct instance_data *lib_data, void *extra_data);

though I’m less adamant about that. Very very simple signature lists are best.

Why? Simple. While you may think that I’m going to have a custom callback routine with private embedded data in it all primed and ready for your particular call, you’re wrong. This is C we’re talking here–it’s not like we have closures, so there’s no way to have any sort of data bound at runtime. If I want to bind any data to the call it means I need to stick it in a global somewhere. Blech. Very, very ungood.

It gets even worse when dealing with any sort of indirect access to the library–like, say, if you’re trying to do this from an interpreter. For that to work without any sort of data pointer requires creating a custom C function, either at compile-time for the module (which requires having a C compiler of some sort handy) or at runtime (which requires the capability of creating new functions on the fly) neither of which is particularly desirable. (Parrot, for example, doesn’t need a C compiler handy to interface to most C libraries)

Postgres, pleasantly, doesn’t make this mistake. You should endeavor to not make it as well.

If you want a really good reason, consider the following. Someone is writing an interface to your library for an interpreted language. Perl, Python, Ruby, Java, something on .NET–doesn’t matter. The program runs, conceptually at least, on the interpreter. The interface writer wants to be able to write those callback functions in the interpreted language.

With a separate data parameter, it’s easy. The interpreter builds some sort of closure structure, sets the callback function to be an entry point to the interpreter, and the callback data to be that closure structure. When the callback’s made, the entry point function yanks all the info it needs out of the data parameter, sets up the world, and calls into the properly set up interpreter. While there may be a lot of really nasty funkiness going on in there, it’s at least doable.

With no data parameter, though… you’re stuck. The only way to do it, short of generating a new custom function pointer (which isn’t that tough, but is painfully non-portable and something that gives most people a screaming fit to even think about) is to stuff the information you need for the callback into a C global somewhere. The problem there is that it means you can only have one pending callback (which is often suboptimal) and you’ve got potentially unpleasant threading issues. This is an especially egregious mistake with things like GUI interfaces where you may have dozens of hundreds of some sort of thing instantiated. At least there you’ve often got an OO interface, so there’s the data in the objects, but even then it makes the low-level stuff annoying.

Generally people who use the libraries realize the problem straightaway, but the problem is that often the people using the libraries aren’t the people writing the libraries…

[Squawks of the Parrot]

Comments are closed.