Let's Build a Simple Database

Writing a sqlite clone from scratch in C

Overview

View on GitHub (pull requests welcome)

Part 15 - Deleting Rows from a Leaf Node



We can insert rows and we can read them back out. But we can’t remove them. Every real database needs a delete command, and ours is no exception. In this article we’ll implement simple deletion from a leaf node.

I’m going to hold off on rebalancing the tree after deletion for now – we’ll tackle that in the next part. For now, deleting a row means finding it in the B-tree and removing it from its leaf node.

Parsing the Delete Statement

First, let’s add a new statement type:

-typedef enum { STATEMENT_INSERT, STATEMENT_SELECT } StatementType;
+typedef enum {
+  STATEMENT_INSERT,
+  STATEMENT_SELECT,
+  STATEMENT_DELETE
+} StatementType;

And a new execute result for when the key doesn’t exist:

 typedef enum {
   EXECUTE_SUCCESS,
   EXECUTE_DUPLICATE_KEY,
+  EXECUTE_KEY_NOT_FOUND,
 } ExecuteResult;

The syntax for delete will be delete <id>. Parsing is similar to insert, but we only need the id:

+PrepareResult prepare_delete(InputBuffer* input_buffer, Statement* statement) {
+  statement->type = STATEMENT_DELETE;
+
+  char* keyword = strtok(input_buffer->buffer, " ");
+  char* id_string = strtok(NULL, " ");
+
+  if (id_string == NULL) {
+    return PREPARE_SYNTAX_ERROR;
+  }
+
+  int id = atoi(id_string);
+  if (id < 0) {
+    return PREPARE_NEGATIVE_ID;
+  }
+
+  statement->row_to_insert.id = id;
+
+  return PREPARE_SUCCESS;
+}

We’re reusing row_to_insert.id to store the key we want to delete. It’s a bit of a hack, but it saves us from adding another field to Statement just to hold a single integer.

Now wire it into prepare_statement():

 PrepareResult prepare_statement(InputBuffer* input_buffer,
                                 Statement* statement) {
   if (strncmp(input_buffer->buffer, "insert", 6) == 0) {
     return prepare_insert(input_buffer, statement);
   }
   if (strcmp(input_buffer->buffer, "select") == 0) {
     statement->type = STATEMENT_SELECT;
     return PREPARE_SUCCESS;
   }
+  if (strncmp(input_buffer->buffer, "delete", 6) == 0) {
+    return prepare_delete(input_buffer, statement);
+  }

   return PREPARE_UNRECOGNIZED_STATEMENT;
 }

Removing a Cell from a Leaf Node

The actual removal is straightforward. We shift all cells after the deleted one to the left by one position, then decrement the cell count:

+void leaf_node_delete(Cursor* cursor) {
+  void* node = get_page(cursor->table->pager, cursor->page_num);
+  uint32_t num_cells = *leaf_node_num_cells(node);
+
+  // Shift cells to fill the gap
+  for (uint32_t i = cursor->cell_num; i < num_cells - 1; i++) {
+    memcpy(leaf_node_cell(node, i), leaf_node_cell(node, i + 1),
+           LEAF_NODE_CELL_SIZE);
+  }
+
+  *(leaf_node_num_cells(node)) -= 1;
+}

Think of it like removing an element from the middle of an array. Everything to the right slides over to fill the hole.

Executing the Delete

Now we need execute_delete(). It uses table_find() to locate the key in the B-tree, checks that the key actually exists, and then calls leaf_node_delete():

+ExecuteResult execute_delete(Statement* statement, Table* table) {
+  uint32_t key_to_delete = statement->row_to_insert.id;
+  Cursor* cursor = table_find(table, key_to_delete);
+
+  void* node = get_page(table->pager, cursor->page_num);
+  uint32_t num_cells = *leaf_node_num_cells(node);
+
+  if (cursor->cell_num >= num_cells) {
+    free(cursor);
+    return EXECUTE_KEY_NOT_FOUND;
+  }
+
+  uint32_t key_at_index = *leaf_node_key(node, cursor->cell_num);
+  if (key_at_index != key_to_delete) {
+    free(cursor);
+    return EXECUTE_KEY_NOT_FOUND;
+  }
+
+  leaf_node_delete(cursor);
+
+  free(cursor);
+
+  return EXECUTE_SUCCESS;
+}

table_find() returns a cursor pointing to the position where the key should be. But the key might not actually be there – maybe we’re looking for a key that was never inserted. So we check two things: is the cursor past the end of the leaf, and does the key at the cursor’s position actually match? If either check fails, the key doesn’t exist.

Wire it into execute_statement():

 ExecuteResult execute_statement(Statement* statement, Table* table) {
   switch (statement->type) {
     case (STATEMENT_INSERT):
       return execute_insert(statement, table);
     case (STATEMENT_SELECT):
       return execute_select(statement, table);
+    case (STATEMENT_DELETE):
+      return execute_delete(statement, table);
   }
 }

And handle the new result in main():

     switch (execute_statement(&statement, table)) {
       case (EXECUTE_SUCCESS):
         printf("Executed.\n");
         break;
       case (EXECUTE_DUPLICATE_KEY):
         printf("Error: Duplicate key.\n");
         break;
+      case (EXECUTE_KEY_NOT_FOUND):
+        printf("Error: Key not found.\n");
+        break;
     }

Testing

Let’s try it out:

db > insert 1 user1 person1@example.com
Executed.
db > insert 2 user2 person2@example.com
Executed.
db > insert 3 user3 person3@example.com
Executed.
db > delete 2
Executed.
db > select
(1, user1, person1@example.com)
(3, user3, person3@example.com)
Executed.
db >

Sweet, it works! The row with id 2 is gone.

What happens if we try to delete a key that doesn’t exist?

db > delete 5
Error: Key not found.

And our deletion persists across sessions too:

+  it 'deletes a row' do
+    script = [
+      "insert 1 user1 person1@example.com",
+      "insert 2 user2 person2@example.com",
+      "insert 3 user3 person3@example.com",
+      "delete 2",
+      "select",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to match_array([
+      "db > Executed.",
+      "db > Executed.",
+      "db > Executed.",
+      "db > Executed.",
+      "db > (1, user1, person1@example.com)",
+      "(3, user3, person3@example.com)",
+      "Executed.",
+      "db > ",
+    ])
+  end
+
+  it 'prints error message when deleting non-existent key' do
+    script = [
+      "insert 1 user1 person1@example.com",
+      "delete 5",
+      "select",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to match_array([
+      "db > Executed.",
+      "db > Error: Key not found.",
+      "db > (1, user1, person1@example.com)",
+      "Executed.",
+      "db > ",
+    ])
+  end
+
+  it 'deletes rows and persists changes' do
+    result1 = run_script([
+      "insert 1 user1 person1@example.com",
+      "insert 2 user2 person2@example.com",
+      "insert 3 user3 person3@example.com",
+      "delete 2",
+      ".exit",
+    ])
+
+    result2 = run_script([
+      "select",
+      ".exit",
+    ])
+    expect(result2).to match_array([
+      "db > (1, user1, person1@example.com)",
+      "(3, user3, person3@example.com)",
+      "Executed.",
+      "db > ",
+    ])
+  end

A Looming Problem

This works great for small trees. But there’s a subtle issue we’re ignoring. In a B-tree, every non-root node must maintain a minimum number of keys. When we delete a cell from a leaf that’s already at its minimum occupancy, the node “underflows.” A well-behaved B-tree fixes this by borrowing from a sibling or merging two nodes together.

We’re not doing any of that yet. If you delete enough rows from a leaf, it could end up empty while its parent still points to it. That’s a problem.

Next time we’ll implement rebalancing: borrowing from siblings, merging underflowing nodes, and collapsing the tree when the root becomes unnecessary. It’s gonna be great.