Pointers Are Complicated, or: What's in a Byte?
— 1 minute read
Ralf Jung explains what a “pointer” in Rust or C really is and why pointers pointing to the same address may still not be equivalent.
int test() { auto x = new int[8]; auto y = new int[8]; y[0] = 42; auto x_ptr = x+8; // one past the end if (x_ptr == &y[0]) *x_ptr = 23; return y[0]; }
Now, imagine that x and y have been allocated right next to each other, with y having the higher address. Then
x_ptr
actually points right at the beginning ofy
! The conditional is true and the write happens. Still, there is no UB [*undefined behavior] due to out-of-bounds pointer arithmetic. The key point here is that just because
x_ptr
and&y[0]
point to the same address, that does not make them the same pointer, i.e., they cannot be used interchangeably:&y[0]
points to the first element ofy;
x_ptr
points past the end of x. If we replace*x_ptr = 23
by*&y[0] = 0
, we change the meaning of the program, even though the two pointers have been tested for equality.
We have seen that in languages like C++ and Rust (unlike on real hardware), pointers can be different even when they point to the same address, and that a byte is more than just a number in
0..256
. This is also why calling C “portable assembly” may have been appropriate in 1978, but is a dangerously misleading statement nowadays.