Graph and time-series databases

The NoSQL modeling lesson named graph and wide-column stores. Two specialized engines deserve a hands-on look, because each is the obvious winner for a data shape that relational design handles badly: graph databases for relationship-heavy data, and time-series databases for streams of timestamped measurements.

Graph databases: relationships as first-class data

A graph database stores nodes (entities), edges (relationships), and properties (key-value attributes on either). The relationship is not a foreign key you join on later - it is stored directly and traversed in constant time per hop.

This shines for questions about connections: friends-of-friends, recommendations, fraud rings, dependency chains.

Querying with Cypher

Cypher is the most common graph query language (Neo4j's, and the basis for the new standard). It draws the pattern you want as ASCII art - (node)-[:REL]->(node) - and the engine finds matches.

Friends of friends Ana does not already follow:

MATCH (ana:Person {name: 'Ana'})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof)
WHERE NOT (ana)-[:FOLLOWS]->(fof) AND fof <> ana
RETURN DISTINCT fof.name

"People who bought this also bought" recommendation:

MATCH (ana:Person {name: 'Ana'})-[:BOUGHT]->(product)<-[:BOUGHT]-(other)
MATCH (other)-[:BOUGHT]->(suggestion)
WHERE NOT (ana)-[:BOUGHT]->(suggestion)
RETURN suggestion.name, count(*) AS strength
ORDER BY strength DESC

Contrast: the SQL recursive-join pain

The same friends-of-friends question in SQL needs a self-join per hop, or a recursive CTE that walks the edge table:

WITH RECURSIVE reachable(person_id, depth) AS (
  SELECT followee_id, 1
  FROM follows WHERE follower_id = :ana_id
  UNION ALL
  SELECT f.followee_id, r.depth + 1
  FROM follows f
  JOIN reachable r ON f.follower_id = r.person_id
  WHERE r.depth < 2
)
SELECT DISTINCT person_id FROM reachable WHERE depth = 2;

Every extra hop is another join or another recursion level, and performance degrades fast as the graph deepens. In a graph database, a hop is just following a stored pointer - traversal depth is cheap. That is the whole reason the category exists.

Property graph vs RDF, and the GQL standard

Two graph models exist. The property graph (Neo4j, used above) puts properties on nodes and edges - the pragmatic, mainstream choice. RDF (triple stores) models everything as subject-predicate-object triples, queried with SPARQL, and underpins the semantic web and knowledge graphs.

Until recently graph querying had no SQL-like standard. GQL (ISO/IEC 39075), ratified in 2024, is the first new ISO database query language since SQL - a Cypher-derived standard for property graphs. It is doing for graph queries what SQL did for relational.

Time-series databases: optimized for the clock

A time-series database (TSDB) is built for data that arrives as a continuous stream of timestamped points: metrics, sensor readings, logs, financial ticks, IoT telemetry. The defining trait is that time is the primary axis - data is almost always written in time order and queried over time ranges.

What they optimize

Time-ordered ingest - append-heavy, write-once. New points arrive at the "now" end, so storage is tuned for high-throughput sequential writes, not random updates.
Downsampling - roll raw points up to coarser intervals (per-second to per-minute to per-hour) so old data stays queryable without storing every point forever.
Retention and TTL - automatically drop or archive data past an age. You rarely need per-second metrics from two years ago.
Continuous aggregates - pre-computed, incrementally maintained rollups (hourly averages) so dashboards read a small summary instead of scanning raw points.
Time-bucketing functions - first-class operators to group by arbitrary time windows, the query you run constantly.

Engines

TimescaleDB - a PostgreSQL extension. You keep SQL, your tooling, and joins to relational tables, and gain hypertables, compression, and continuous aggregates.
InfluxDB - a purpose-built TSDB with its own ingest protocol and query languages, popular for infrastructure and IoT metrics.
Prometheus - a metrics-focused TSDB tightly tied to monitoring and alerting.

Postgres + Timescale is often enough

As with vectors and JSON, you usually do not need a separate system. If you already run PostgreSQL, TimescaleDB turns it into a capable time-series store while keeping SQL and your relational data in one place. Reach for a dedicated TSDB only when ingest volume or specialized features genuinely demand it - the same "use what you run" judgment that recurs across this stage.

Quick quiz