From denormalized data to 3NF.
Given this denormalized table, identify issues:
| OrderID | CustomerName | CustomerEmail | ProductNames | TotalPrice |
|---|---|---|---|---|
| 1 | John Doe | john@email.com | Laptop, Mouse, Keyboard | 1500 |
| 2 | John Doe | john@email.com | Monitor | 300 |
| 3 | Jane Smith | jane@email.com | Laptop, USB Cable | 1520 |
Questions to answer:
Problem: ProductNames column contains comma-separated values: "Laptop, Mouse, Keyboard"
Why it's bad:
Example of difficulty:
-- Try to find all orders containing "Laptop"
-- This is messy and error-prone:
SELECT * FROM Orders WHERE ProductNames LIKE '%Laptop%';
-- What if it's "laptop" vs "Laptop"? What about "Laptops"?
Problem: Customer data (name, email) repeated for every order
Why it's bad:
Example of update anomaly:
-- John changes his email - must update EVERY row!
UPDATE Orders
SET CustomerEmail = 'john.doe@newemail.com'
WHERE CustomerName = 'John Doe';
-- Easy to miss rows, creates data inconsistency
Problem: TotalPrice is likely derived from product prices × quantities
Why it's bad:
What's not captured:
Rule: Eliminate repeating groups, ensure atomic values (no lists in columns)
Action: Split ProductNames into separate rows
| OrderID | CustomerName | CustomerEmail | ProductNames | TotalPrice |
|---|---|---|---|---|
| 1 | John Doe | john@email.com | Laptop, Mouse, Keyboard | 1500 |
Problem: ProductNames has multiple values in one cell
| OrderID | CustomerName | CustomerEmail | ProductName | Quantity | Price |
|---|---|---|---|---|---|
| 1 | John Doe | john@email.com | Laptop | 1 | 1200 |
| 1 | John Doe | john@email.com | Mouse | 1 | 50 |
| 1 | John Doe | john@email.com | Keyboard | 1 | 250 |
Improvement: Each cell has atomic (single) value
1NF Achievement: No repeating groups, all values are atomic, each column has one value.
Note: Still have redundancy (customer data repeated 3 times).
Rule: Remove partial dependencies (non-key attributes depending on part of composite key)
Action: Separate customers into their own table
Single table: Customer data repeated for each product in order
| OrderID=1, CustomerName="John Doe" |
| OrderID=1, CustomerName="John Doe" |
| OrderID=1, CustomerName="John Doe" |
Problem: Customer info depends only on OrderID, not on ProductName
CUSTOMERS Table:
| CustomerID | Name | |
|---|---|---|
| 1 | John Doe | john@email.com |
| 2 | Jane Smith | jane@email.com |
ORDERS Table:
| OrderID | CustomerID | OrderDate |
|---|---|---|
| 1 | 1 | 2025-01-15 |
| 2 | 1 | 2025-01-20 |
Improvement: Customer data stored ONCE, referenced by ID
2NF Achievement: In 1NF, no partial dependencies, customer data separated.
Note: Still need to separate products from order items.
Rule: Remove transitive dependencies (non-key attributes depending on other non-key attributes)
Action: Create separate PRODUCTS table and ORDER_ITEMS junction table
ORDER_ITEMS:
| OrderID | ProductName | Price | Quantity |
|---|---|---|---|
| 1 | Laptop | 1200 | 1 |
| 1 | Mouse | 50 | 1 |
Problem: Price depends on ProductName, not on OrderID. Same product may have different prices in different orders!
PRODUCTS Table:
| ProductID | ProductName | CurrentPrice |
|---|---|---|
| 1 | Laptop | 1200 |
| 2 | Mouse | 50 |
ORDER_ITEMS Table:
| OrderItemID | OrderID | ProductID | Quantity | PriceAtPurchase |
|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1200 |
| 2 | 1 | 2 | 1 | 50 |
Improvement: Product info centralized, historical prices preserved
-- ============================================
-- NORMALIZED ORDER MANAGEMENT SYSTEM (3NF)
-- ============================================
-- 1. CUSTOMERS Table
CREATE TABLE CUSTOMERS (
customer_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
phone VARCHAR(20),
shipping_address TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
INDEX idx_email (email),
INDEX idx_name (name)
) ENGINE=InnoDB;
-- 2. PRODUCTS Table
CREATE TABLE PRODUCTS (
product_id INT AUTO_INCREMENT PRIMARY KEY,
product_name VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
current_price DECIMAL(10,2) NOT NULL,
stock_quantity INT DEFAULT 0,
category VARCHAR(100),
CHECK (current_price >= 0),
CHECK (stock_quantity >= 0),
INDEX idx_product_name (product_name),
INDEX idx_category (category)
) ENGINE=InnoDB;
-- 3. ORDERS Table
CREATE TABLE ORDERS (
order_id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATETIME DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50) DEFAULT 'Pending',
total_amount DECIMAL(10,2) DEFAULT 0,
CHECK (status IN ('Pending', 'Processing', 'Shipped', 'Delivered', 'Cancelled')),
CHECK (total_amount >= 0),
FOREIGN KEY (customer_id) REFERENCES CUSTOMERS(customer_id)
ON DELETE RESTRICT
ON UPDATE CASCADE,
INDEX idx_customer (customer_id),
INDEX idx_order_date (order_date),
INDEX idx_status (status)
) ENGINE=InnoDB;
-- 4. ORDER_ITEMS Junction Table
CREATE TABLE ORDER_ITEMS (
order_item_id INT AUTO_INCREMENT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL DEFAULT 1,
price_at_purchase DECIMAL(10,2) NOT NULL,
CHECK (quantity > 0),
CHECK (price_at_purchase >= 0),
FOREIGN KEY (order_id) REFERENCES ORDERS(order_id)
ON DELETE CASCADE
ON UPDATE CASCADE,
FOREIGN KEY (product_id) REFERENCES PRODUCTS(product_id)
ON DELETE RESTRICT
ON UPDATE CASCADE,
INDEX idx_order (order_id),
INDEX idx_product (product_id)
) ENGINE=InnoDB;
-- ============================================
-- SAMPLE DATA
-- ============================================
-- Insert customers
INSERT INTO CUSTOMERS (name, email, phone, shipping_address) VALUES
('John Doe', 'john@email.com', '555-0001', '123 Main St, City, 12345'),
('Jane Smith', 'jane@email.com', '555-0002', '456 Oak Ave, Town, 67890');
-- Insert products
INSERT INTO PRODUCTS (product_name, description, current_price, stock_quantity, category) VALUES
('Laptop', '15-inch laptop with 16GB RAM', 1200.00, 50, 'Electronics'),
('Mouse', 'Wireless optical mouse', 50.00, 200, 'Accessories'),
('Keyboard', 'Mechanical keyboard RGB', 250.00, 100, 'Accessories'),
('Monitor', '27-inch 4K monitor', 300.00, 75, 'Electronics'),
('USB Cable', 'USB-C cable 6ft', 20.00, 500, 'Accessories');
-- Insert orders
INSERT INTO ORDERS (customer_id, order_date, status) VALUES
(1, '2025-01-15 10:30:00', 'Delivered'),
(1, '2025-01-20 14:15:00', 'Shipped'),
(2, '2025-01-18 09:45:00', 'Processing');
-- Insert order items
INSERT INTO ORDER_ITEMS (order_id, product_id, quantity, price_at_purchase) VALUES
-- Order 1: John's first order (Laptop, Mouse, Keyboard)
(1, 1, 1, 1200.00), -- Laptop
(1, 2, 1, 50.00), -- Mouse
(1, 3, 1, 250.00), -- Keyboard
-- Order 2: John's second order (Monitor)
(2, 4, 1, 300.00), -- Monitor
-- Order 3: Jane's order (Laptop, USB Cable)
(3, 1, 1, 1200.00), -- Laptop
(3, 5, 2, 20.00); -- USB Cable (qty: 2)
-- Update order totals (can be done with triggers in production)
UPDATE ORDERS o
SET total_amount = (
SELECT SUM(quantity * price_at_purchase)
FROM ORDER_ITEMS
WHERE order_id = o.order_id
);
Answer:
Answer:
-- Denormalized (BAD): Must update multiple rows
UPDATE Orders SET CustomerEmail = 'john.new@email.com'
WHERE CustomerName = 'John Doe'; -- Might miss some rows!
-- Normalized (GOOD): Update once
UPDATE CUSTOMERS SET email = 'john.new@email.com'
WHERE customer_id = 1; -- All orders automatically see new email
Answer: Three-step process:
Answer: Four tables for full 3NF:
| Aspect | Denormalized | Normalized (3NF) |
|---|---|---|
| Customer email update | Update multiple rows | Update once in CUSTOMERS |
| Product price change | All historical orders affected | Current price in PRODUCTS, historical in ORDER_ITEMS |
| Add new customer | Must create dummy order | Add to CUSTOMERS independently |
| Query: "How many laptops sold?" | String parsing: LIKE '%Laptop%' | Simple: SUM(quantity) WHERE product_id=1 |
| Data consistency | High risk of inconsistency | Enforced by foreign keys |
| Storage efficiency | Redundant data everywhere | Each fact stored once |
-- Get complete order details with customer and products
SELECT
o.order_id,
c.name AS customer_name,
c.email,
p.product_name,
oi.quantity,
oi.price_at_purchase,
(oi.quantity * oi.price_at_purchase) AS line_total,
o.total_amount AS order_total,
o.status
FROM ORDERS o
JOIN CUSTOMERS c ON o.customer_id = c.customer_id
JOIN ORDER_ITEMS oi ON o.order_id = oi.order_id
JOIN PRODUCTS p ON oi.product_id = p.product_id
WHERE o.order_id = 1;
-- Total revenue by product
SELECT
p.product_name,
SUM(oi.quantity) AS total_quantity_sold,
SUM(oi.quantity * oi.price_at_purchase) AS total_revenue
FROM PRODUCTS p
JOIN ORDER_ITEMS oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name
ORDER BY total_revenue DESC;
-- Customer's order history
SELECT
o.order_id,
o.order_date,
o.status,
o.total_amount,
COUNT(oi.order_item_id) AS item_count
FROM ORDERS o
LEFT JOIN ORDER_ITEMS oi ON o.order_id = oi.order_id
WHERE o.customer_id = 1
GROUP BY o.order_id
ORDER BY o.order_date DESC;
-- Products that need restocking (low inventory)
SELECT
product_name,
stock_quantity,
current_price
FROM PRODUCTS
WHERE stock_quantity < 50
ORDER BY stock_quantity ASC;