Lecture 4: Aggregations & Window Functions

Duration: 3 hours | Theory: 15 min | Practice: 165 min

Overview

Slides 2-3 introduce COUNT/SUM/AVG basics. Slide 5 covers GROUP BY. Slides 6-7 show subqueries and RANK(). Slides 8-9 dive into window functions. The rest is practice. Students learn to answer business questions: "which facility makes the most money?" instead of just "show me all bookings".

Slides 2-4: Basic Aggregations (15 min)

Slide 2 lists aggregate functions. Slide 3 shows SELECT count(*) FROM facilities. Don't just show it - have students type these:

-- How many facilities?
SELECT COUNT(*) FROM facilities;

-- How many bookings total?
SELECT COUNT(*) FROM bookings;

-- Unique members who booked?
SELECT COUNT(DISTINCT memid) FROM bookings;

-- Average monthly maintenance cost?
SELECT AVG(monthlymaintenance) FROM facilities;

Common mistake: students try SELECT name, MAX(guestcost) FROM facilities; without GROUP BY. Show them this fails, explain they need slide 5 content (GROUP BY) for that.

Slide 5: GROUP BY (30 min)

This is where aggregations get useful. The slide shows counting bookings per member. Explain GROUP BY like sorting cards by suit, then counting each pile.

-- Bookings per facility
SELECT facid, COUNT(*) AS booking_count
FROM bookings
GROUP BY facid
ORDER BY booking_count DESC;

-- With facility names
SELECT f.name, COUNT(b.bookid) AS bookings
FROM facilities f
LEFT JOIN bookings b ON f.facid = b.facid
GROUP BY f.facid, f.name
ORDER BY bookings DESC;

Key rule students forget: "Every column in SELECT must be in GROUP BY or inside an aggregate function." When they violate this, MySQL throws an error (in strict mode) - let them hit it, then explain.

Introduce HAVING after they grasp GROUP BY:

-- Members with more than 10 bookings
SELECT memid, COUNT(*) AS booking_count
FROM bookings
GROUP BY memid
HAVING COUNT(*) > 10
ORDER BY booking_count DESC;

"WHERE filters rows before grouping. HAVING filters groups after aggregation. Can't use WHERE COUNT(*) - that's HAVING territory."

Slides 6-7: Subqueries and RANK() (30 min)

Slide 6 shows subqueries (note: I fixed the syntax error - there was an extra comma). Slide 7 adds RANK() window function (MySQL 8.0+).

-- Facilities costing more than average
SELECT name, membercost
FROM facilities
WHERE membercost > (SELECT AVG(membercost) FROM facilities);

-- Top 4 members by booking count (with rank)
SELECT memberid, cnt, pos
FROM (
    SELECT memid as memberid,
           COUNT(*) as cnt,
           RANK() OVER (ORDER BY COUNT(*) DESC) as pos
    FROM bookings
    GROUP BY memid
) ranked
WHERE pos <= 4;

Slide 8 shows the MySQL < 8.0 alternative using LIMIT. If students have old MySQL, stick with LIMIT for now.

Slide 9 explains query processing order (FROM → WHERE → GROUP BY → SELECT). Important: "You can't filter on RANK() in WHERE because it's calculated in SELECT. That's why we need a subquery."

Slide 10: Window Functions (MySQL 8.0+) (45 min)

Slide 11 lists window functions. Show the difference: "GROUP BY collapses rows. Window functions keep all rows but add calculations."

-- Rank facilities by booking count (keeps all rows!)
SELECT f.name,
       COUNT(b.bookid) AS bookings,
       RANK() OVER (ORDER BY COUNT(b.bookid) DESC) AS rank
FROM facilities f
LEFT JOIN bookings b ON f.facid = b.facid
GROUP BY f.facid, f.name;

-- Running total of bookings over time
SELECT DATE(starttime) AS date,
       COUNT(*) AS daily_bookings,
       SUM(COUNT(*)) OVER (ORDER BY DATE(starttime)) AS running_total
FROM bookings
GROUP BY DATE(starttime);

PARTITION BY is tricky - it's like "GROUP BY for window functions":

-- Rank members within each month
SELECT firstname, surname,
       DATE_FORMAT(starttime, '%Y-%m') AS month,
       COUNT(*) AS bookings,
       RANK() OVER (
           PARTITION BY DATE_FORMAT(starttime, '%Y-%m')
           ORDER BY COUNT(*) DESC
       ) AS monthly_rank
FROM bookings b
JOIN members m ON b.memid = m.memid
GROUP BY m.memid, firstname, surname, month;

Student Practice (60+ min)

Give students these challenges. They should work independently:

  1. Total revenue per facility (from slide 10 example)
  2. Average slots per booking
  3. Facility with highest revenue (use subquery)
  4. Members with 5+ bookings
  5. Monthly revenue for each facility
  6. Rank facilities by total bookings
  7. Running total of bookings per member
  8. Members who booked more than average member
  9. Percentage of total bookings per facility
  10. Top 3 most booked facilities using RANK()

Walk around. Common mistakes:

Homework

Analytics report on clubdata: (1) Revenue by facility with percentage of total, (2) Member booking counts with year-over-year comparison, (3) Facility utilization rates, (4) Peak booking hours.

Next lecture preview: "We're moving to Northwind database - a real e-commerce system with customers, products, orders. You'll write 30 queries exploring it independently."