Database Daily: Detecting Page Block Padding

29 April 2023

Goal: Modify relation scan iterator code to deal with data gaps at page boundaries

A bit of a winding road this morning. Initially I thought that putting a 2 byte integer representing tuple size would work nicely. If we read a zero size, then the reader can just keep reading until it reads on non-zero size.

The issue with that approach was that the padding at the end of a 4kb page could be and odd number of bytes - which meant that reading a 16 bit int would pick up “junk” data past the 4kb page boundary, suggesting that there was a valid tuple size being read when there actually wasn’t!

My simple solution was to include a tuple header whose first byte is non-zero, and then read only a single byte to determine presence. If a null byte is read, it’s essentially a “null tuple” header.

   +- tuple header
   |
|-----|
 1 2 3 4 5 6 7 8 ... n
+-+---+----------------+
| |   | tuple contents |
+-+---+----------------+
 |  |
 |  +- tuple size
 |
 +- tuple presence

This means that records no longer span the 4kb boundary! Next time, I can look into reading a 4kb page at a time. That gets me back on track to work on the out-of-core merge sort algorithm – being able to read some fixed number of pages at a time, sort those, and write the sorted set back to disk.

Written by Dan Drust on 29 April 2023

Continue Reading: Database Daily: Page Block Bound…

Browse more posts

Dan Drust

Projects

Posts

Database Daily: Detecting Page Block Padding