Pointers Are Complicated, or: What's in a Byte?

 — 1 minute read


Ralf Jung explains what a “pointer” in Rust or C really is and why pointers pointing to the same address may still not be equivalent.

int test() {
    auto x = new int[8];
    auto y = new int[8];
    y[0] = 42;
    auto x_ptr = x+8; // one past the end
    if (x_ptr == &y[0])
      *x_ptr = 23;
    return y[0];
}

Now, imagine that x and y have been allocated right next to each other, with y having the higher address. Then x_ptr actually points right at the beginning of y! The conditional is true and the write happens. Still, there is no UB [*undefined behavior] due to out-of-bounds pointer arithmetic. The key point here is that just because x_ptr and &y[0] point to the same address, that does not make them the same pointer, i.e., they cannot be used interchangeably: &y[0] points to the first element of y; x_ptr points past the end of x. If we replace *x_ptr = 23 by *&y[0] = 0, we change the meaning of the program, even though the two pointers have been tested for equality.

We have seen that in languages like C++ and Rust (unlike on real hardware), pointers can be different even when they point to the same address, and that a byte is more than just a number in 0..256. This is also why calling C “portable assembly” may have been appropriate in 1978, but is a dangerously misleading statement nowadays.