Let's Build a Simple Database

Writing a sqlite clone from scratch in C

Overview

View on GitHub (pull requests welcome)

Part 21 - Transactions



“Either all of it happens, or none of it does.” – the essence of atomicity

Until now, every statement we execute takes effect immediately and permanently. If you insert three rows, they’re all committed right away. If your program crashes halfway through a batch of inserts, you get a partially-updated database. That’s not great.

Real databases support transactions: a group of operations that either all succeed (commit) or all fail (rollback). This is the “A” in ACID – Atomicity.

The Commands

We’ll add three new statements:

 typedef enum {
   STATEMENT_INSERT,
   STATEMENT_SELECT,
-  STATEMENT_DELETE
+  STATEMENT_DELETE,
+  STATEMENT_BEGIN,
+  STATEMENT_COMMIT,
+  STATEMENT_ROLLBACK
 } StatementType;

The Undo Log

Our approach is based on shadow paging: before modifying a page during a transaction, we save a copy of its original state. On commit, we throw away those copies (the changes are already in memory and will be flushed). On rollback, we restore the copies, effectively rewinding time.

We add an undo log to the Pager:

 typedef struct {
   int file_descriptor;
   uint32_t file_length;
   uint32_t num_pages;
   void* pages[TABLE_MAX_PAGES];
   bool dirty[TABLE_MAX_PAGES];
   uint32_t access_time[TABLE_MAX_PAGES];
   uint32_t clock;
+  bool in_transaction;
+  #define MAX_UNDO_PAGES 64
+  uint32_t undo_page_nums[MAX_UNDO_PAGES];
+  void* undo_pages[MAX_UNDO_PAGES];
+  uint32_t num_undo_pages;
 } Pager;

The key insight: we hook into pager_mark_dirty(). This function is already called before every page modification. We piggyback on it to save the undo copy:

 void pager_mark_dirty(Pager* pager, uint32_t page_num) {
+  if (pager->in_transaction) {
+    bool already_saved = false;
+    for (uint32_t i = 0; i < pager->num_undo_pages; i++) {
+      if (pager->undo_page_nums[i] == page_num) {
+        already_saved = true;
+        break;
+      }
+    }
+    if (!already_saved && pager->num_undo_pages < MAX_UNDO_PAGES) {
+      void* copy = malloc(PAGE_SIZE);
+      memcpy(copy, pager->pages[page_num], PAGE_SIZE);
+      uint32_t idx = pager->num_undo_pages++;
+      pager->undo_page_nums[idx] = page_num;
+      pager->undo_pages[idx] = copy;
+    }
+  }
   pager->dirty[page_num] = true;
 }

The first time a page is marked dirty during a transaction, we save a snapshot of its current (pre-modification) state. If the same page is modified again, we skip it – we already have the original saved.

Commit and Rollback

Commit is easy – just free the undo copies and exit the transaction. The modified pages are already in the buffer pool and will be flushed to disk when the database closes:

+ExecuteResult execute_commit(Statement* statement, Table* table) {
+  Pager* pager = table->pager;
+  for (uint32_t i = 0; i < pager->num_undo_pages; i++) {
+    free(pager->undo_pages[i]);
+  }
+  pager->num_undo_pages = 0;
+  pager->in_transaction = false;
+  return EXECUTE_SUCCESS;
+}

Rollback is the reverse – restore each undo copy and clear the dirty flag:

+ExecuteResult execute_rollback(Statement* statement, Table* table) {
+  Pager* pager = table->pager;
+  for (uint32_t i = 0; i < pager->num_undo_pages; i++) {
+    uint32_t page_num = pager->undo_page_nums[i];
+    memcpy(pager->pages[page_num], pager->undo_pages[i], PAGE_SIZE);
+    pager->dirty[page_num] = false;
+    free(pager->undo_pages[i]);
+  }
+  pager->num_undo_pages = 0;
+  pager->in_transaction = false;
+  return EXECUTE_SUCCESS;
+}

After rollback, the in-memory pages are exactly as they were before the transaction started. And since we cleared the dirty flags, they won’t be written to disk.

Testing

Let’s verify that rollback actually undoes changes:

+  it 'rolls back a transaction' do
+    script = [
+      "insert 1 user1 person1@example.com",
+      "begin",
+      "insert 2 user2 person2@example.com",
+      "insert 3 user3 person3@example.com",
+      "rollback",
+      "select",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to include("(1, user1, person1@example.com)")
+    expect(result).not_to include("(2, user2, person2@example.com)")
+    expect(result).not_to include("(3, user3, person3@example.com)")
+  end

Row 1 was inserted before the transaction, so it survives. Rows 2 and 3 were inserted during the transaction, so rollback erases them.

And commit makes things permanent:

+  it 'commits a transaction' do
+    script = [
+      "begin",
+      "insert 1 user1 person1@example.com",
+      "insert 2 user2 person2@example.com",
+      "commit",
+      "select",
+      ".exit",
+    ]
+    result = run_script(script)
+    expect(result).to include("(1, user1, person1@example.com)")
+    expect(result).to include("(2, user2, person2@example.com)")
+  end

Limitations

Our transaction implementation is minimal but teaches the core concept. A real database adds:

That last point is particularly important. Shadow paging works, but it requires copying entire 4 KB pages even if only a few bytes changed. Write-ahead logging is more efficient – and that’s what we’ll implement next.