Normalization Practice: Step-by-Step Solution

From denormalized data to 3NF.

Problem: Denormalized Order Table

Given this denormalized table, identify issues:

OrderID CustomerName CustomerEmail ProductNames TotalPrice
1 John Doe john@email.com Laptop, Mouse, Keyboard 1500
2 John Doe john@email.com Monitor 300
3 Jane Smith jane@email.com Laptop, USB Cable 1520

Questions to answer:

  1. What's wrong with storing ProductNames as a comma-separated list?
  2. What happens if John changes his email?
  3. How would you normalize this to 3NF?
  4. Which tables would you create?

1. Problems with Current Design

Issue #1: Multi-Valued Attributes (Violates 1NF)

Problem: ProductNames column contains comma-separated values: "Laptop, Mouse, Keyboard"

Why it's bad:

Example of difficulty:

-- Try to find all orders containing "Laptop"
-- This is messy and error-prone:
SELECT * FROM Orders WHERE ProductNames LIKE '%Laptop%';
-- What if it's "laptop" vs "Laptop"? What about "Laptops"?

Issue #2: Data Redundancy (Violates 2NF/3NF)

Problem: Customer data (name, email) repeated for every order

Why it's bad:

Example of update anomaly:

-- John changes his email - must update EVERY row!
UPDATE Orders
SET CustomerEmail = 'john.doe@newemail.com'
WHERE CustomerName = 'John Doe';
-- Easy to miss rows, creates data inconsistency

Issue #3: Calculated Values

Problem: TotalPrice is likely derived from product prices × quantities

Why it's bad:

Issue #4: Missing Information

What's not captured:

2. Step-by-Step Normalization

Step 1: First Normal Form (1NF)

Rule: Eliminate repeating groups, ensure atomic values (no lists in columns)

Action: Split ProductNames into separate rows

Before (Violates 1NF)

OrderID CustomerName CustomerEmail ProductNames TotalPrice
1 John Doe john@email.com Laptop, Mouse, Keyboard 1500

Problem: ProductNames has multiple values in one cell

After (1NF Compliant)

OrderID CustomerName CustomerEmail ProductName Quantity Price
1 John Doe john@email.com Laptop 1 1200
1 John Doe john@email.com Mouse 1 50
1 John Doe john@email.com Keyboard 1 250

Improvement: Each cell has atomic (single) value

1NF Achievement: No repeating groups, all values are atomic, each column has one value.

Note: Still have redundancy (customer data repeated 3 times).

Step 2: Second Normal Form (2NF)

Rule: Remove partial dependencies (non-key attributes depending on part of composite key)

Action: Separate customers into their own table

Before (Not 2NF)

Single table: Customer data repeated for each product in order

OrderID=1, CustomerName="John Doe"
OrderID=1, CustomerName="John Doe"
OrderID=1, CustomerName="John Doe"

Problem: Customer info depends only on OrderID, not on ProductName

After (2NF Compliant)

CUSTOMERS Table:

CustomerIDNameEmail
1John Doejohn@email.com
2Jane Smithjane@email.com

ORDERS Table:

OrderIDCustomerIDOrderDate
112025-01-15
212025-01-20

Improvement: Customer data stored ONCE, referenced by ID

2NF Achievement: In 1NF, no partial dependencies, customer data separated.

Note: Still need to separate products from order items.

Step 3: Third Normal Form (3NF)

Rule: Remove transitive dependencies (non-key attributes depending on other non-key attributes)

Action: Create separate PRODUCTS table and ORDER_ITEMS junction table

Before (Not 3NF)

ORDER_ITEMS:

OrderIDProductNamePriceQuantity
1Laptop12001
1Mouse501

Problem: Price depends on ProductName, not on OrderID. Same product may have different prices in different orders!

After (3NF Compliant)

PRODUCTS Table:

ProductIDProductNameCurrentPrice
1Laptop1200
2Mouse50

ORDER_ITEMS Table:

OrderItemIDOrderIDProductIDQuantityPriceAtPurchase
11111200
212150

Improvement: Product info centralized, historical prices preserved

3. Final Normalized Schema (3NF)

erDiagram CUSTOMERS ||--o{ ORDERS : "places" ORDERS ||--|{ ORDER_ITEMS : "contains" PRODUCTS ||--o{ ORDER_ITEMS : "appears in" CUSTOMERS { int customer_id PK varchar name varchar email UK varchar phone text shipping_address datetime created_at } ORDERS { int order_id PK int customer_id FK datetime order_date varchar status decimal total_amount } PRODUCTS { int product_id PK varchar product_name UK text description decimal current_price int stock_quantity varchar category } ORDER_ITEMS { int order_item_id PK int order_id FK int product_id FK int quantity decimal price_at_purchase }

Complete SQL Schema

-- ============================================
-- NORMALIZED ORDER MANAGEMENT SYSTEM (3NF)
-- ============================================

-- 1. CUSTOMERS Table
CREATE TABLE CUSTOMERS (
    customer_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    phone VARCHAR(20),
    shipping_address TEXT,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_email (email),
    INDEX idx_name (name)
) ENGINE=InnoDB;

-- 2. PRODUCTS Table
CREATE TABLE PRODUCTS (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    product_name VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,
    current_price DECIMAL(10,2) NOT NULL,
    stock_quantity INT DEFAULT 0,
    category VARCHAR(100),
    CHECK (current_price >= 0),
    CHECK (stock_quantity >= 0),
    INDEX idx_product_name (product_name),
    INDEX idx_category (category)
) ENGINE=InnoDB;

-- 3. ORDERS Table
CREATE TABLE ORDERS (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATETIME DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(50) DEFAULT 'Pending',
    total_amount DECIMAL(10,2) DEFAULT 0,
    CHECK (status IN ('Pending', 'Processing', 'Shipped', 'Delivered', 'Cancelled')),
    CHECK (total_amount >= 0),
    FOREIGN KEY (customer_id) REFERENCES CUSTOMERS(customer_id)
        ON DELETE RESTRICT
        ON UPDATE CASCADE,
    INDEX idx_customer (customer_id),
    INDEX idx_order_date (order_date),
    INDEX idx_status (status)
) ENGINE=InnoDB;

-- 4. ORDER_ITEMS Junction Table
CREATE TABLE ORDER_ITEMS (
    order_item_id INT AUTO_INCREMENT PRIMARY KEY,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL DEFAULT 1,
    price_at_purchase DECIMAL(10,2) NOT NULL,
    CHECK (quantity > 0),
    CHECK (price_at_purchase >= 0),
    FOREIGN KEY (order_id) REFERENCES ORDERS(order_id)
        ON DELETE CASCADE
        ON UPDATE CASCADE,
    FOREIGN KEY (product_id) REFERENCES PRODUCTS(product_id)
        ON DELETE RESTRICT
        ON UPDATE CASCADE,
    INDEX idx_order (order_id),
    INDEX idx_product (product_id)
) ENGINE=InnoDB;

-- ============================================
-- SAMPLE DATA
-- ============================================

-- Insert customers
INSERT INTO CUSTOMERS (name, email, phone, shipping_address) VALUES
('John Doe', 'john@email.com', '555-0001', '123 Main St, City, 12345'),
('Jane Smith', 'jane@email.com', '555-0002', '456 Oak Ave, Town, 67890');

-- Insert products
INSERT INTO PRODUCTS (product_name, description, current_price, stock_quantity, category) VALUES
('Laptop', '15-inch laptop with 16GB RAM', 1200.00, 50, 'Electronics'),
('Mouse', 'Wireless optical mouse', 50.00, 200, 'Accessories'),
('Keyboard', 'Mechanical keyboard RGB', 250.00, 100, 'Accessories'),
('Monitor', '27-inch 4K monitor', 300.00, 75, 'Electronics'),
('USB Cable', 'USB-C cable 6ft', 20.00, 500, 'Accessories');

-- Insert orders
INSERT INTO ORDERS (customer_id, order_date, status) VALUES
(1, '2025-01-15 10:30:00', 'Delivered'),
(1, '2025-01-20 14:15:00', 'Shipped'),
(2, '2025-01-18 09:45:00', 'Processing');

-- Insert order items
INSERT INTO ORDER_ITEMS (order_id, product_id, quantity, price_at_purchase) VALUES
-- Order 1: John's first order (Laptop, Mouse, Keyboard)
(1, 1, 1, 1200.00),  -- Laptop
(1, 2, 1, 50.00),    -- Mouse
(1, 3, 1, 250.00),   -- Keyboard
-- Order 2: John's second order (Monitor)
(2, 4, 1, 300.00),   -- Monitor
-- Order 3: Jane's order (Laptop, USB Cable)
(3, 1, 1, 1200.00),  -- Laptop
(3, 5, 2, 20.00);    -- USB Cable (qty: 2)

-- Update order totals (can be done with triggers in production)
UPDATE ORDERS o
SET total_amount = (
    SELECT SUM(quantity * price_at_purchase)
    FROM ORDER_ITEMS
    WHERE order_id = o.order_id
);

4. Answers to Original Questions

Q1: What's wrong with storing ProductNames as comma-separated list?

Answer:

Q2: What happens if John changes his email?

Answer:

-- Denormalized (BAD): Must update multiple rows
UPDATE Orders SET CustomerEmail = 'john.new@email.com'
WHERE CustomerName = 'John Doe';  -- Might miss some rows!

-- Normalized (GOOD): Update once
UPDATE CUSTOMERS SET email = 'john.new@email.com'
WHERE customer_id = 1;  -- All orders automatically see new email

Q3: How would you normalize this to 3NF?

Answer: Three-step process:

  1. 1NF: Split comma-separated ProductNames into separate rows, add Quantity and Price columns
  2. 2NF: Extract customer data into CUSTOMERS table, link via customer_id foreign key
  3. 3NF: Extract product data into PRODUCTS table, create ORDER_ITEMS junction table with price_at_purchase to preserve historical prices

Q4: Which tables would you create?

Answer: Four tables for full 3NF:

  1. CUSTOMERS: customer_id (PK), name, email, phone, shipping_address, created_at
  2. PRODUCTS: product_id (PK), product_name, description, current_price, stock_quantity, category
  3. ORDERS: order_id (PK), customer_id (FK), order_date, status, total_amount
  4. ORDER_ITEMS: order_item_id (PK), order_id (FK), product_id (FK), quantity, price_at_purchase

5. Benefits of Normalization

Advantages of 3NF Design

Aspect Denormalized Normalized (3NF)
Customer email update Update multiple rows Update once in CUSTOMERS
Product price change All historical orders affected Current price in PRODUCTS, historical in ORDER_ITEMS
Add new customer Must create dummy order Add to CUSTOMERS independently
Query: "How many laptops sold?" String parsing: LIKE '%Laptop%' Simple: SUM(quantity) WHERE product_id=1
Data consistency High risk of inconsistency Enforced by foreign keys
Storage efficiency Redundant data everywhere Each fact stored once

6. Common Queries with Normalized Schema

-- Get complete order details with customer and products
SELECT
    o.order_id,
    c.name AS customer_name,
    c.email,
    p.product_name,
    oi.quantity,
    oi.price_at_purchase,
    (oi.quantity * oi.price_at_purchase) AS line_total,
    o.total_amount AS order_total,
    o.status
FROM ORDERS o
JOIN CUSTOMERS c ON o.customer_id = c.customer_id
JOIN ORDER_ITEMS oi ON o.order_id = oi.order_id
JOIN PRODUCTS p ON oi.product_id = p.product_id
WHERE o.order_id = 1;

-- Total revenue by product
SELECT
    p.product_name,
    SUM(oi.quantity) AS total_quantity_sold,
    SUM(oi.quantity * oi.price_at_purchase) AS total_revenue
FROM PRODUCTS p
JOIN ORDER_ITEMS oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name
ORDER BY total_revenue DESC;

-- Customer's order history
SELECT
    o.order_id,
    o.order_date,
    o.status,
    o.total_amount,
    COUNT(oi.order_item_id) AS item_count
FROM ORDERS o
LEFT JOIN ORDER_ITEMS oi ON o.order_id = oi.order_id
WHERE o.customer_id = 1
GROUP BY o.order_id
ORDER BY o.order_date DESC;

-- Products that need restocking (low inventory)
SELECT
    product_name,
    stock_quantity,
    current_price
FROM PRODUCTS
WHERE stock_quantity < 50
ORDER BY stock_quantity ASC;