Stage 6 - Scaling and Advanced
The final stage is about what happens when one machine is no longer enough, and when many users hit the data at once. This is the intermediate-to-advanced gate: not new syntax, but the engineering judgment to reason about distributed-data trade-offs. We take the topics in the order you should actually apply them - scale only when you need to, and in the cheapest way first.
Learning objectives
By the end of this stage you will be able to:
- Apply replication, partitioning, and sharding in the right order - replicate first, partition next, shard last.
- Pick a shard key and recognize the hotspots a bad one creates.
- Reason about CAP and PACELC: choose C or A during a partition, and latency vs consistency the rest of the time.
- Explain the internals that keep a database correct - MVCC, locks, the write-ahead log - and how consensus powers NewSQL.
- Design a small RAG pipeline: embeddings, chunking, ANN indexes (HNSW/IVF), hybrid search, and when to re-embed.
- Match a workload to the right scaling tool rather than reaching for the most complex option by default.
Map of this stage
The lessons in this stage
- Scaling out - replication, partitioning, and sharding, applied in the right order (shard last), with topology diagrams.
- Consistency and concurrency - the CAP and PACELC trade-offs, MVCC and the write-ahead log, and how NewSQL keeps SQL and ACID at scale.
- Vector databases and RAG - embeddings, nearest-neighbor search, and retrieval-augmented generation: the AI capstone.
- Stage 6 review - scaling and consistency scenarios, a RAG-design challenge, a cumulative quiz, and a cheatsheet.
Status
All four Stage 6 lessons are ready - and with them, the whole path (Stages 0-6) is complete.