Part 20 - Secondary Indexes

Our WHERE clause can find rows by id efficiently because id is the primary key – it’s the key in our B-tree. But what if you want to find a user by their username? Right now, that means scanning every row. For a table with millions of rows, that’s unacceptable.

The answer is a secondary index: a separate data structure that maps a non-primary column to the primary key. Instead of scanning, you look up the username in the index, get back the id, then use the id to fetch the full row from the primary B-tree. Two lookups instead of a million.

How a Secondary Index Works

The primary B-tree stores rows keyed by id:

Primary B-tree: id -> (id, username, email)

A secondary index on username maps usernames to primary keys:

Username index: hash(username) -> id

To look up username = "bob":

Hash “bob” to get a key
Search the index for that key -> get id = 2
Search the primary B-tree for id = 2 -> get the full row

The Hash Function

We need a hash function to turn strings into integer keys. We’ll use djb2, a simple and effective string hash:

+uint32_t hash_string(const char* str) {
+  uint32_t hash = 5381;
+  int c;
+  while ((c = *str++)) {
+    hash = ((hash << 5) + hash) + c;
+  }
+  return hash;
+}

Index Page Format

For simplicity, we’ll store our index as a sorted array of (hash, primary_key) pairs packed into a single page. Each entry is 8 bytes, giving us room for 511 entries:

+const uint32_t INDEX_ENTRY_SIZE = 2 * sizeof(uint32_t);
+const uint32_t INDEX_HEADER_SIZE = sizeof(uint32_t);
+const uint32_t INDEX_MAX_ENTRIES =
+    (PAGE_SIZE - INDEX_HEADER_SIZE) / INDEX_ENTRY_SIZE;

A real database would use a full B-tree for the index (just like the primary table). We’re using a flat sorted array for clarity, but the concept is the same: a separate data structure that maps column values to primary keys.

Index Operations

Inserting into the index uses binary search to find the right position, then shifts entries right:

+void index_insert(Table* table, uint32_t hash_key, uint32_t primary_key) {
+  if (!table->has_index) return;
+  void* page = get_page(table->pager, table->index_page_num);
+  pager_mark_dirty(table->pager, table->index_page_num);
+  uint32_t num = *index_num_entries(page);
+
+  uint32_t lo = 0, hi = num;
+  while (lo < hi) {
+    uint32_t mid = (lo + hi) / 2;
+    if (*index_entry_hash(page, mid) < hash_key) {
+      lo = mid + 1;
+    } else {
+      hi = mid;
+    }
+  }
+
+  for (uint32_t i = num; i > lo; i--) {
+    *index_entry_hash(page, i) = *index_entry_hash(page, i - 1);
+    *index_entry_pk(page, i) = *index_entry_pk(page, i - 1);
+  }
+
+  *index_entry_hash(page, lo) = hash_key;
+  *index_entry_pk(page, lo) = primary_key;
+  *index_num_entries(page) = num + 1;
+}

Lookup also uses binary search – O(log n):

+uint32_t index_find(Table* table, uint32_t hash_key) {
+  if (!table->has_index) return INVALID_PAGE_NUM;
+  void* page = get_page(table->pager, table->index_page_num);
+  uint32_t num = *index_num_entries(page);
+
+  uint32_t lo = 0, hi = num;
+  while (lo < hi) {
+    uint32_t mid = (lo + hi) / 2;
+    if (*index_entry_hash(page, mid) < hash_key) {
+      lo = mid + 1;
+    } else {
+      hi = mid;
+    }
+  }
+
+  if (lo < num && *index_entry_hash(page, lo) == hash_key) {
+    return *index_entry_pk(page, lo);
+  }
+  return INVALID_PAGE_NUM;
+}

Creating the Index

The create index on username command allocates a new page, scans the entire table, and populates the index:

+  } else if (strcmp(input_buffer->buffer, "create index on username") == 0) {
+    table->index_page_num = get_unused_page_num(table->pager);
+    void* index_page = get_page(table->pager, table->index_page_num);
+    memset(index_page, 0, PAGE_SIZE);
+    pager_mark_dirty(table->pager, table->index_page_num);
+    table->has_index = true;
+
+    Cursor* cursor = table_start(table);
+    Row row;
+    while (!(cursor->end_of_table)) {
+      deserialize_row(cursor_value(cursor), &row);
+      index_insert(table, hash_string(row.username), row.id);
+      cursor_advance(cursor);
+    }
+    free(cursor);
+    printf("Index created on username.\n");

Using the Index

When we see select where username = bob, we hash “bob”, look it up in the index, and use the returned primary key to fetch the full row:

+    case WHERE_USERNAME_EQ: {
+      uint32_t hash = hash_string(statement->where_username);
+      uint32_t pk = index_find(table, hash);
+      if (pk != INVALID_PAGE_NUM) {
+        cursor = table_find(table, pk);
+        void* node = get_page(table->pager, cursor->page_num);
+        uint32_t num_cells = *leaf_node_num_cells(node);
+        if (cursor->cell_num < num_cells) {
+          deserialize_row(cursor_value(cursor), &row);
+          if (strcmp(row.username, statement->where_username) == 0) {
+            print_row(&row);
+          }
+        }
+        free(cursor);
+      }
+      return EXECUTE_SUCCESS;
+    }

Notice the strcmp check after the index lookup. Hash collisions are possible – two different usernames could hash to the same value. The strcmp confirms we actually found the right row. A production index would chain colliding entries and check all of them.

Maintaining the Index

Every insert also inserts into the index:

   leaf_node_insert(cursor, row_to_insert->id, row_to_insert);
+  index_insert(table, hash_string(row_to_insert->username), row_to_insert->id);

The same applies for delete. Any write to the primary table must be reflected in all secondary indexes – this is the maintenance cost of indexes. More indexes mean faster reads but slower writes.

Testing

+  it 'creates an index and looks up by username' do
+    script = [
+      "insert 1 alice alice@example.com",
+      "insert 2 bob bob@example.com",
+      "insert 3 charlie charlie@example.com",
+      "create index on username",
+      "select where username = bob",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to include("Index created on username.")
+    expect(result).to include("(2, bob, bob@example.com)")
+    expect(result).not_to include("(1, alice, alice@example.com)")
+  end

Without the index, finding “bob” requires scanning every row. With the index, it’s two O(log n) lookups: one in the index, one in the primary B-tree. For a million rows, that’s roughly 20 page reads instead of thousands.

Next time we’ll add transactions so that a sequence of changes can be committed atomically or rolled back entirely.

Part 19 - Variable-Length Records

Part 21 - Transactions