Aggregate Functions, GROUP BY, and HAVING in SQL

 

Aggregate Functions, GROUP BY, and HAVING in SQL


1. What Are Aggregate Functions?

Aggregate functions summarize information from multiple tuples (rows) and produce a single value.

Common SQL Aggregate Functions

FunctionPurpose
COUNT            Counts number of rows or values
SUM            Computes total of numeric values
MAX            Finds maximum value
MIN            Finds minimum value
AVG            Computes average (mean)

📌 These functions are mainly used in:

  • SELECT clause

  • HAVING clause


2. Simple Aggregation (No GROUP BY)

Query 

SELECT SUM(Salary), MAX(Salary), MIN(Salary), AVG(Salary) FROM EMPLOYEE;

Explanation

  • Applies aggregate functions to all tuples in EMPLOYEE

  • Produces one row with four values

  • Gives a company-wide salary summary


Renaming Aggregate Results (Q19A)

SELECT SUM(Salary) AS Total_Sal, MAX(Salary) AS Highest_Sal, MIN(Salary) AS Lowest_Sal, AVG(Salary) AS Average_Sal FROM EMPLOYEE;

AS improves readability
✔ Column names appear in the result table


3. Aggregation with Selection (WHERE clause)

Query 

SELECT SUM(Salary), MAX(Salary), MIN(Salary), AVG(Salary) FROM (EMPLOYEE JOIN DEPARTMENT ON Dno = Dnumber) WHERE Dname = 'Research';

Explanation

  1. EMPLOYEE is joined with DEPARTMENT

  2. WHERE clause filters only Research department employees

  3. Aggregate functions apply only to selected tuples

📌 Still returns one row, but for a subset of data


4. COUNT Function Variations

Query  – Count all employees

SELECT COUNT(*) FROM EMPLOYEE;
  • COUNT(*) counts rows

  • Includes rows even if attributes are NULL


Query  – Count employees in Research department

SELECT COUNT(*) FROM EMPLOYEE, DEPARTMENT WHERE Dno = Dnumber AND Dname = 'Research';
  • WHERE clause restricts tuples first

  • COUNT counts remaining rows


Query  – Count distinct salary values

SELECT COUNT(DISTINCT Salary) FROM EMPLOYEE;

Important Rules

ExpressionBehavior
COUNT(*)            Counts rows
COUNT(Salary)            Counts non-NULL salaries
COUNT(DISTINCT Salary)            Counts unique, non-NULL salaries

📌 NULL values are ignored by all aggregate functions except COUNT(*).


5. Aggregate Functions with Nested Queries

Query – Employees with ≥ 2 dependents

SELECT Lname, Fname FROM EMPLOYEE WHERE ( SELECT COUNT(*) FROM DEPENDENT WHERE Ssn = Essn ) >= 2;

Explanation

  • Inner query is correlated with outer query

  • For each employee:

    • Count dependents

  • If count ≥ 2 → employee selected

✔ Aggregate functions work inside nested queries


6. Boolean Aggregate Functions: SOME and ALL

FunctionMeaning
SOME                TRUE if at least one value is TRUE
ALL                TRUE if all values are TRUE

Used less frequently, but useful in advanced conditions.


GROUP BY Clause


7. Why GROUP BY Is Needed

Without GROUP BY:

  • Aggregate functions summarize entire table

With GROUP BY:

  • Data is split into groups

  • Aggregate functions apply per group


8. GROUP BY Example

Query 

SELECT Dno, COUNT(*), AVG(Salary) FROM EMPLOYEE GROUP BY Dno;

Explanation

  • EMPLOYEE tuples are grouped by Dno

  • Each group = employees of one department

  • COUNT and AVG are computed per department

📌 Output:

  • One row per department

  • Not one row per employee

Rule:

Any attribute in SELECT that is not aggregated must appear in GROUP BY.



 


9. GROUP BY with NULL Values

  • Tuples with NULL in grouping attribute form a separate group

  • Example: employees with Dno = NULL


10. GROUP BY with JOIN

Query 

SELECT Pnumber, Pname, COUNT(*) FROM PROJECT, WORKS_ON WHERE Pnumber = Pno GROUP BY Pnumber, Pname;

Explanation

  1. PROJECT and WORKS_ON are joined

  2. Grouping is done after join

  3. COUNT counts employees per project


HAVING Clause


11. Why HAVING Is Needed

  • WHERE filters rows

  • HAVING filters groups

📌 HAVING is applied after GROUP BY and aggregation


12. HAVING Example

Query 

SELECT Pnumber, Pname, COUNT(*) FROM PROJECT, WORKS_ON WHERE Pnumber = Pno GROUP BY Pnumber, Pname HAVING COUNT(*) > 2;

Explanation

  1. Join PROJECT and WORKS_ON

  2. Group by project

  3. COUNT employees per project

  4. HAVING keeps only projects with more than 2 employees


13. WHERE vs HAVING (Very Important)

WHEREHAVING
Filters tuples            Filters groups
Applied before grouping            Applied after grouping
Cannot use aggregate functions            Uses aggregate functions

14. Complex Condition Pitfall (Incorrect Query)

Incorrect Query

SELECT Dno, COUNT(*) FROM EMPLOYEE WHERE Salary > 40000 GROUP BY Dno HAVING COUNT(*) > 5;

Why It’s Wrong

  • WHERE filters employees first

  • HAVING counts only employees earning > 40,000

  • This finds departments with:

    more than 5 high-salary employees

❌ Not what was intended


15. Correct Solution Using Nested Query

Query 

SELECT Dno, COUNT(*) FROM EMPLOYEE WHERE Salary > 40000 AND Dno IN ( SELECT Dno FROM EMPLOYEE GROUP BY Dno HAVING COUNT(*) > 5 ) GROUP BY Dno;

Explanation

  1. Inner query:

    • Finds departments with more than 5 employees

  2. Outer query:

    • Counts employees earning > 40,000

    • Only in departments selected by inner query

✔ Correct logic
✔ Correct evaluation order


Execution Order of SQL Clauses 

  1. FROM

  2. WHERE

  3. GROUP BY

  4. HAVING

  5. SELECT

  6. ORDER BY


Summary

  • Aggregate functions summarize data

  • GROUP BY creates subgroups

  • HAVING filters groups

  • WHERE filters rows

  • NULL values are ignored in aggregates (except COUNT*)

  • Nested queries help express complex conditions

Comments

Popular posts from this blog

Database Management Systems DBMS PCCST402 Semester 4 KTU CS 2024 Scheme

Data Models, Schemas and Instances

Introduction to Database Management System -DBMS