Stage 6 review
This page pulls Stage 6 together. Scale forces trade-offs single-machine databases hide: where data lives, what happens during a network partition, and how meaning-based search works. Work the scenarios before opening the answers, then take the quiz and keep the cheatsheet handy.
Go back to a lesson if a scenario stumps you: Scaling out, Consistency and concurrency, Vector databases and RAG.
Scenarios
Scenario 1 - Replicate, partition, or shard?
A read-heavy app slows down. Reads outnumber writes 50:1, and the dataset still fits one machine. What do you reach for first, and what would push you further?
Show the answer
Replicate first. Add read replicas and send reads to them; the primary handles writes. It directly fixes a read-bound load and is the least disruptive step.
Escalate only when replication is not enough: partition (split one table by a key, e.g. by date, within a machine) when a table is huge, then shard (spread data across many machines by a shard key) when one machine can no longer hold the data or the write volume. The order is replicate first, partition next, shard last - each step adds operational complexity.
Scenario 2 - Pick the shard key
You shard an events table. A teammate proposes sharding by country. Most traffic is one country. What goes wrong, and what makes a better key?
Show the answer
Sharding by country creates a hotspot: the busiest country's shard takes most traffic while others sit idle - you have not really distributed load. Low-cardinality, skewed keys are bad shard keys.
A better key has high cardinality and even access - e.g. a hash of user_id, or a composite. The goal is to spread reads and writes evenly and keep related rows you query together on the same shard. Re-sharding later is expensive, so choose carefully up front.
Scenario 3 - The partition hits
A network partition splits your distributed database. CAP says you must give something up. For (a) a bank ledger and (b) a "last seen" presence indicator, do you choose consistency or availability, and what does PACELC add?
Show the answer
During a partition you can keep Consistency or Availability, not both (P is not optional).
- (a) Bank ledger -> consistency. Refuse writes that cannot be safely coordinated; a wrong balance is worse than a brief outage.
- (b) Presence indicator -> availability. Keep serving even if a "last seen" value is slightly stale; nobody is harmed.
PACELC adds the steady-state choice: else (no partition), you still trade Latency vs Consistency - e.g. synchronous replication is more consistent but slower; asynchronous is faster but can serve stale reads.
Scenario 4 - Design a RAG pipeline
A support assistant must answer from your help articles. Sketch the retrieval pipeline and the two quality levers you would tune first.
Show the answer
Pipeline: chunk articles -> embed each chunk with an embedding model -> store vectors in an index (HNSW/IVF, e.g. pgvector) -> at query time, embed the question, retrieve nearest neighbors, optionally rerank, then feed the top chunks to the LLM as context.
First levers: chunking (size and overlap - too big buries the answer, too small loses context) and hybrid search (combine keyword/BM25 with vector similarity, then rerank) to catch both exact terms and meaning. And re-embed everything if you change the embedding model - query and stored vectors must come from the same model.
Comprehensive quiz
Stage 6 cumulative review
6 questions1Reads vastly outnumber writes and the data fits one machine. The first scaling move is:
2A good shard key is characterized by:
3The CAP theorem says that during a network partition a distributed system must choose between:
4PACELC extends CAP by also describing the trade-off when there is NO partition:
5NewSQL systems (CockroachDB, Spanner, TiDB) aim to provide:
6In a RAG system, if you switch to a new embedding model you must:
Stage 6 cheatsheet
Open the Stage 6 cheatsheet
Scaling out
- Replicate first - read replicas offload reads; primary takes writes.
- Partition next - split a big table by key (often within a machine).
- Shard last - spread data across machines by a shard key.
- Shard key = high cardinality + even access; avoid hotspots and monotonic keys. Re-sharding is expensive.
Consistency
- CAP - during a Partition, choose C or A (P is mandatory).
- PACELC - if Partition: C-or-A; Else: Latency-or-Consistency.
- ACID-C (valid state) is a different word from distributed consistency (same latest value everywhere).
- MVCC / WAL scale up too; NewSQL = SQL + ACID + horizontal scale via consensus.
Vector search and RAG
- Embedding = vector capturing meaning; same model for stored data and queries.
- ANN index = HNSW / IVF for fast approximate nearest-neighbor; pgvector keeps it next to relational data.
- RAG pipeline: chunk -> embed -> index -> (query: embed -> retrieve -> rerank) -> LLM context.
- Tune chunk size/overlap and add hybrid search (BM25 + vector + rerank).
- Re-embed the whole corpus when the model changes.
That is the full journey - from "what is a database" to distributed and AI databases. Put it all together in the Capstone: design, populate, query, index, secure, and report on a bookstore from scratch.