Aggregating rows

So far each query returned rows much like the ones stored. Aggregation is different: it collapses many rows into one summary number. This is where raw data starts turning into information - "how many orders," "total revenue," "average order value."

The aggregate functions you will use constantly are COUNT, SUM, AVG, MIN, and MAX. On their own they fold a whole table into a single row:

Loading SQL sandbox…

That answers "across all orders, how many, what total, what average." One row out, no matter how many rows in.

GROUP BY: one summary per group

Usually you do not want one number for the whole table - you want one number per customer, per country, per month. GROUP BY splits the rows into groups and produces one summary row for each.

Loading SQL sandbox…

The rule that confuses beginners: every column in SELECT must either be in the GROUP BY or be wrapped in an aggregate. That is not pedantry - if you grouped by customer_id, the database has many total values per group and no way to show just one, so you must tell it how to combine them (SUM, AVG, ...). AS simply renames a column in the output.

WHERE versus HAVING

Both filter, but at different moments - and the evaluation order from the last lesson tells you which to reach for. WHERE filters rows before grouping. HAVING filters groups after the aggregate is computed.

Loading SQL sandbox…

You cannot put SUM(total) > 50 in WHERE, because when WHERE runs the sum does not exist yet. That single fact is why HAVING has to exist at all.

Exercise

Worked, then half-filled, then yours. Use the sandbox.

Loading SQL sandbox…

1. Worked. Count how many customers are in each country (the query above).

2. Finish it. For each customer, show their total spend. Fill the blanks, then run - the box checks your result.

Loading SQL sandbox…

3. Write it yourself. Show only the customers whose total spend is above 30.00, with that total. Run it to check your answer.

Loading SQL sandbox…

Show answers

2. SELECT customer_id, SUM(total) AS spent FROM orders GROUP BY customer_id;

SELECT customer_id, SUM(total) AS spent
FROM orders
GROUP BY customer_id
HAVING SUM(total) > 30.00;

GROUP BY: one summary per group​

WHERE versus HAVING​

Exercise​

GROUP BY: one summary per group

WHERE versus HAVING

Exercise