How to Use SQL for Data Analysis: Aggregations and Grouping
Introduction
SQL (Structured Query Language) is one of the most powerful tools for data analysis, allowing users to extract insights from databases efficiently.
In this guide, we’ll focus on:
- ✅ Aggregate functions like
SUM()
,AVG()
,COUNT()
, and more. - ✅ The
GROUP BY
clause for segmenting data into meaningful groups. - ✅ Real-world SQL examples using sample datasets.
1. Understanding SQL Aggregate Functions
Aggregate functions perform calculations on a set of rows and return a single value. These are essential for summarizing data in analytics.
✅ Common Aggregate Functions in SQL:
-
SUM()
– Adds up all values in a column. -
AVG()
– Calculates the average value. -
COUNT()
– Counts the number of rows. -
MIN()
– Finds the smallest value. -
MAX()
– Finds the largest value.
2. Working with a Sample Dataset
Let’s assume we have a table named sales_data
with the following structure:
sale_id | product | category | quantity | price | sale_date |
---|---|---|---|---|---|
1 | Laptop | Electronics | 2 | 900 | 2024-02-01 |
2 | Phone | Electronics | 3 | 500 | 2024-02-02 |
3 | TV | Electronics | 1 | 1200 | 2024-02-03 |
4 | Shirt | Clothing | 5 | 30 | 2024-02-01 |
3. Using SQL Aggregate Functions for Data Analysis
Example 1: Calculating Total Sales Revenue (SUM()
)
SELECT SUM(quantity * price) AS total_revenue
FROM sales_data;
Example 2: Counting the Number of Sales (COUNT()
)
SELECT COUNT(sale_id) AS total_sales
FROM sales_data;
Example 3: Finding the Average Sale Price (AVG()
)
SELECT AVG(price) AS average_price
FROM sales_data;
4. Grouping Data with GROUP BY
Example 4: Total Revenue by Product Category
SELECT category, SUM(quantity * price) AS total_revenue
FROM sales_data
GROUP BY category;
Example 5: Number of Sales by Category
SELECT category, COUNT(*) AS num_sales
FROM sales_data
GROUP BY category;
5. Filtering Grouped Data with HAVING
Example 6: Find Categories with Revenue Greater than $500
SELECT category, SUM(quantity * price) AS total_revenue
FROM sales_data
GROUP BY category
HAVING SUM(quantity * price) > 500;
Final Thoughts
SQL is an essential skill for data analysts, allowing you to:
- ✔️ Summarize large datasets with aggregate functions.
- ✔️ Group data by categories using
GROUP BY
. - ✔️ Filter summarized data with
HAVING
. - ✔️ Extract insights and trends for business intelligence.
Next Steps? Try applying these SQL techniques on your own datasets or explore JOIN operations for even deeper analysis!