Skip to main content

Stage 3 review

This page pulls Stage 3 together. Two skills decide whether a database is production-ready: making queries fast (plans and indexes) and keeping them correct under concurrency (transactions and isolation). Work the challenges before opening the answers, then take the quiz and keep the cheatsheet handy.

Go back to the lesson if a challenge stumps you: Indexes and query plans, Transactions and isolation.

Challenges

Challenge 1 - Read the plan

A query against a 100,000-row events table runs EXPLAIN QUERY PLAN and reports SCAN events. After you add an index on the filtered column, it reports SEARCH events USING INDEX .... What changed, and why is the second faster?

Show the answer

SCAN means a full-table scan: the engine reads every row to find matches - work grows with table size. SEARCH USING INDEX means it jumped straight to the matching rows through the index's sorted structure - roughly log(n) instead of n.

The index turned a linear read of 100,000 rows into a handful of lookups. That is the whole point of an index: trade a little write cost and disk for a large read speedup on the columns you filter, join, and sort on.

Challenge 2 - Design the index

A query filters WHERE status = 'open' AND created_at > ? and sorts ORDER BY created_at. Which single index helps most, and what is the column order? Would an index on status alone be enough?

Show the answer

A composite index (status, created_at) - equality column first, then the range/sort column. The left-prefix rule: the index is usable for status =, then walks created_at in order, so it serves the filter and the ORDER BY without a separate sort.

An index on status alone helps the equality but still leaves a sort and a range scan on created_at. Putting created_at first would break the status = lookup. Order matters: equality, then range/sort.

A covering index (status, created_at) that also INCLUDEs the selected columns could answer the query from the index alone - no table fetch at all.

Challenge 3 - Name the anomaly

For each scenario, name the read anomaly and the lowest isolation level that prevents it:

  1. A transaction reads a row another transaction has changed but not yet committed.
  2. A transaction reads the same row twice and gets two different values.
  3. A transaction runs the same WHERE twice and the second time more rows match.
Show the answer
  1. Dirty read - prevented by Read Committed (and above). You never see uncommitted data.
  2. Non-repeatable read - prevented by Repeatable Read (and above). A row you read stays stable for the transaction.
  3. Phantom read - prevented by Serializable. The set of rows matching a predicate stays stable, not just individual rows.

Higher isolation = fewer anomalies but more contention. Most apps run Read Committed and reach for Serializable only where correctness demands it.

Challenge 4 - The deadlock

Two transactions each update rows A and B, but in opposite order, and freeze. What happened, how does the database resolve it, and what must your application do?

Show the answer

A deadlock: T1 holds A and waits for B; T2 holds B and waits for A. Neither can proceed.

The database detects the cycle and aborts one transaction (the victim), rolling it back so the other completes. Your application must catch the serialization/deadlock error and retry the victim - ideally with exponential backoff. Prevention: have all transactions acquire locks in a consistent order (always A before B), and keep transactions short.

Build a table, check the plan (a full SCAN), add an index, and check again (now a SEARCH USING INDEX). Run it as written, then read the two plan lines.

Loading SQL sandbox…

Now add the index and re-check the plan - it should change to SEARCH big USING INDEX:

Loading SQL sandbox…
Why the flip happens

Without the index, the only plan is to read all 5,000 rows. With idx_big_cat, the engine seeks directly to the cat = 'c7' entries. Same query, very different work.

Comprehensive quiz

Stage 3 cumulative review

7 questions

1EXPLAIN QUERY PLAN reports SCAN on a large table for a selective WHERE. What does that tell you?

2A query does WHERE a = ? AND b > ? ORDER BY b. Which composite index serves it best?

3An index on a column where 90% of rows share the same value (low selectivity) barely helps a query filtering for that common value. Why?

4Which isolation level prevents non-repeatable reads but may still allow phantoms?

5What is the practical difference between locking and MVCC for handling concurrent reads and writes?

6Your app occasionally gets a 'could not serialize / deadlock detected' error under load. The right response is to:

7A SAVEPOINT inside a transaction lets you:

Stage 3 cheatsheet

Open the Stage 3 cheatsheet

Reading a query plan

  • EXPLAIN QUERY PLAN <query> (SQLite) / EXPLAIN [ANALYZE] <query> (Postgres).
  • SCAN = full-table read (work grows with rows) - usually a missing index.
  • SEARCH USING INDEX = index seek (roughly log n) - what you want for selective filters.
  • Postgres EXPLAIN ANALYZE shows estimated vs actual rows; a big gap means stale stats (run ANALYZE).

Index design rules

  • Index the columns you filter, join, and sort on - not every column.
  • Composite order: equality columns first, then range/sort (left-prefix rule).
  • Selectivity decides payoff: low-selectivity (mostly-equal) columns barely help.
  • Covering index answers a query from the index alone (no table fetch).
  • Partial index (CREATE INDEX ... WHERE cond) indexes only the rows you query.
  • Each index costs write time and space.

Isolation levels and anomalies

LevelDirty readNon-repeatablePhantom
Read Uncommittedpossiblepossiblepossible
Read Committednopossiblepossible
Repeatable Readnonopossible
Serializablenonono
  • Dirty read - see uncommitted data. Non-repeatable - same row, two values. Phantom - same predicate, new rows.
  • Default to Read Committed; use Serializable only where correctness needs it.

Concurrency control

  • Locking - transactions wait their turn. MVCC - readers get a snapshot, no blocking.
  • Deadlock - opposite lock orders cycle; DB aborts a victim. Fix: consistent lock order, short transactions, retry with backoff.
  • Savepoint - SAVEPOINT s / ROLLBACK TO s for partial rollback.
  • WAL - writes go to a log first for durability and crash recovery.
Next up

Fast and correct in one database. Stage 4 - Databases in Real Apps connects it to the application: access control, ORMs, pooling, security, caching, and operations.