Modeling for NoSQL
Relational design starts from the data: find the entities, normalize so each fact lives once, and let the query planner join them back at read time. NoSQL modeling flips this. You start from the access patterns - the specific reads your app will serve - and shape the data so those reads are fast, even if that means duplicating data. The mindset shift is the whole lesson.
Access-pattern-first
In a relational schema you can ask almost any question after the fact, because joins recombine normalized tables. Most NoSQL stores either cannot join cheaply or not at all, so you design for the queries you know you will run. Ask "what will I read, and how often?" before "what are my entities?" - then store the data pre-shaped for those reads.
Embedding vs referencing
The core document-modeling decision. Given a customer and their orders, you can embed the orders inside the customer document, or reference them by id in a separate collection.
// Embedded: one read returns the customer and their orders
{
"_id": "cust_1",
"name": "Ana",
"orders": [
{ "id": 101, "total": 40.00 },
{ "id": 102, "total": 25.00 }
]
}
// Referenced: orders live separately, linked by customer_id
{ "_id": "cust_1", "name": "Ana" }
{ "_id": "order_101", "customer_id": "cust_1", "total": 40.00 }
- Embed when the related data is read together, belongs to one owner, and is bounded in size - one read, no join, and updates to the whole thing are atomic. The cost is duplication and documents that can grow without limit.
- Reference when the data is large, shared across owners, or updated independently - closer to normalization, but you stitch it together with a second query or a
$lookup.
Rule of thumb: embed what you read together; reference what you share or what grows unbounded.
How the four families want to be modeled
Each family rewards a different shape:
- Document (MongoDB) - nest related data into documents sized to your reads; embed vs reference as above. Modern MongoDB also has
$lookupjoins and multi-document transactions, so the line with relational has blurred. - Key-value (Redis) - everything is reached by a single key, so the design is the key:
cart:user_8842,session:abc. Store a denormalized blob per key; there are no queries across values. - Wide-column (Cassandra, DynamoDB) - one table per query. You pick a partition key (which node holds the data) and clustering keys (sort within it), and you happily duplicate the same data across several tables so each query hits exactly one. "Query-first" design taken to its conclusion.
- Graph (Neo4j) - relationships are first-class. You model nodes and the edges between them, and traverse: "friends of friends who bought X" is a short walk, not a pile of joins.
Trade-offs, and 2026
NoSQL modeling trades the relational guarantees you learned in Stage 2 for read speed and scale:
- Duplication is normal, so you take on keeping copies in sync - and you accept the update anomaly on purpose.
- Joins are limited or absent, so you pre-join by embedding or duplicating.
- Consistency is often tunable rather than guaranteed (the CAP trade-off, covered in Stage 6).
The 2026 picture is convergence: document stores gained transactions and joins, relational databases gained JSON columns and vector search, and the new ISO GQL standard (2024) is doing for graph queries what SQL did for relational. Pick the store whose default model fits your dominant access pattern - but know the categories overlap more than they used to.
Quick quiz
NoSQL modeling
4 questions1How does NoSQL data modeling differ from relational design?
2When should you EMBED related data in a document rather than reference it?
3What is the wide-column (Cassandra) modeling idiom?
4What do you accept when you duplicate data for read speed in NoSQL?
Graph and time-series databases - two specialized stores, hands-on: graph traversal with Cypher, and time-series engines for metrics and sensors.