In most languages the question is whether functions are first class objects or not. In C, they aren’t objects at all.
We’ve already seen that, in C, pointers to functions and pointers to objects (including
void *) are different animals, and trying to convert one to the other results in undefined behavior. And we’ve seen how this can sometimes be mildly inconvenient. But why is this peculiar rule in place, anyway? At the hardware level, code and data are just bytes in memory, right? That’s why things like stack smashing are a problem, because a pointer to data bytes can be interpreted as a pointer to code bytes.
It is true that on the computers most people1 typically encounter today code and data bytes live side by side in the same address space. But it isn’t always so. Some computers expect separate address spaces for code and data. And we don’t have to look very hard to find an example: an x86 processor running in one of its segmented modes will do nicely.
In real mode and 16-bit protected mode the x86 treats memory as segments up to 64KB in size.2 It’s possible to run an entire program out of a single segment if it’s code and data are small enough to fit in 64KB.
But many programs are bigger than that and, at the very least, have to keep their code and data in separate segments. The processor makes this division into code and data very natural, since it keeps the reference to the current code segment in a register, and provides registers to refer to other segments for data manipulation. I already gave a rundown on the conventional memory models used for 16 bit programming on the x86 and the important ways that they differ from one another, here.
The important points for the matter at hand are that:
- In the mixed memory models (medium and compact) code pointers are 16-bits by default while data pointers are 32-bits, or vice-versa.
- In the small memory model, where both code and data pointers are 16-bits by default, the pointers are relative to different segments. This means that code and data pointers in general point to different things, even if they have the same bit pattern.
Translating this into C-speak, this means that sometimes a pointer to a function and a pointer to an object have different sizes in memory (16-bits versus 32 bits). It also means that, even if function and object pointers have the same size, their values may refer to different address spaces and so can’t usefully be converted back and forth or compared.
It’s hard to see exactly what a C compiler could sensibly do in a situation like this if asked to convert a function pointer to an object pointer. And that’s precisely why the standard says that performing that conversion results in undefined behavior (Harbison and Steele, 2002, pp. 185-187).
Undefined behavior leaves the door wide open: on a machine where code and data share an address space, it may be possible to perform the conversion without any drama. But the standard acknowledges that this cannot hope to be completely portable, because there are platforms out there where the conversion doesn’t even make sense and may result in a warning, an error, a crash, or worse.