Let's Build a Simple Database

Writing a sqlite clone from scratch in C

Overview

View on GitHub (pull requests welcome)

Part 23 - Wrapping Up


“What I cannot create, I do not understand.” – Richard Feynman

We started with a question: how does a database work? And to answer it, we built one. Let’s take a step back and look at what we’ve created.

What We Built

Our database – all of it in a single C file – implements:

Storage engine:

Persistence:

Query processing:

Indexing:

Transactions:

Data format:

That’s a lot of database. Not a toy, either – the fundamentals are the same ones used by SQLite, PostgreSQL, and MySQL. B-trees, page caches, WAL, secondary indexes – these aren’t academic curiosities. They’re what makes your favorite database tick.

The Architecture

Here’s how our components map to the SQLite architecture we looked at in Part 1:

SQLite Component Our Implementation
Tokenizer / Parser prepare_statement(), prepare_insert(), prepare_delete()
Code Generator execute_statement() switch
Virtual Machine execute_insert(), execute_select(), execute_delete()
B-Tree leaf_node_*, internal_node_*, table_find()
Pager get_page(), pager_flush(), LRU eviction
OS Interface open(), read(), write(), lseek()

We skipped the bytecode layer (our “VM” calls functions directly), but the layering is the same.

What a Real Database Adds

There’s always more to build. Here are the biggest things a production database has that we don’t:

Multiple tables and joins. We have one hardcoded table. A real database has a schema catalog, multiple B-trees (one per table), and join algorithms (nested loop, hash join, sort-merge) for combining data across tables.

A query planner. We always use the B-tree index for primary key lookups and do a full scan otherwise. A real database estimates the cost of different access paths and picks the cheapest one. Sometimes a full scan beats an index scan (e.g., when selecting most of the table).

Concurrency control. We support one connection at a time. Real databases handle many concurrent readers and writers using locks, multiversion concurrency control (MVCC), or both.

A proper SQL parser. Our parser uses strcmp and strtok. A real parser uses a grammar (often generated by tools like Lemon or Bison) to handle the full SQL syntax.

Page compaction and free space management. When we delete rows, the space isn’t reclaimed for reuse. A real database maintains a free page list and compacts pages to avoid fragmentation.

Recovery beyond WAL. Our WAL is simple redo logging. Real databases combine redo and undo logging (ARIES protocol), support checkpoints that bound recovery time, and handle partial page writes.

What We Learned

Building a database from scratch taught us:

  1. Why B-trees? Because disk I/O is expensive, and B-trees minimize it. A tree with a branching factor of 500 can index a billion rows in 3 levels – 3 page reads to find any row.

  2. Why pages? Because disks read in fixed-size blocks. By aligning our data structures to page boundaries, we make every I/O operation useful.

  3. Why write-ahead logging? Because writes can fail. By logging before applying, we ensure that committed data survives crashes.

  4. Why indexes? Because scanning every row is O(n). An index makes point queries O(log n) – the difference between milliseconds and minutes.

  5. Why transactions? Because partial updates are worse than no update. Atomicity ensures all-or-nothing semantics.

These aren’t just database concepts. They’re fundamental computer science – the trade-offs between memory and disk, consistency and performance, simplicity and scalability.

Thank You

If you’ve followed along this far, you’ve done something remarkable. You’ve read thousands of lines of C, understood B-tree splitting and merging, implemented crash recovery, and built something that actually stores and retrieves data reliably. That’s not trivial.

The source code is yours to explore, extend, and break. Add multiple tables. Implement joins. Build a proper parser. Or just read through the code and see how it all fits together. The best way to learn is to build, and now you have a foundation to build on.

Until then!