Since database management systems offer a reliable, scalable, and secure method of handling data, they are, in one way or another, at the core of all contemporary applications. One of the core functionalities of DBMS is Aggregate functions. These functions allow us to summarise and analyse data efficiently. Aggregate functions are important in these operations because they will enable us to compute a single result from a set of multiple values; thus, assisting the user in analysing and extracting meaningful information from large datasets. The calculations performed by Aggregate Functions in DBMS are over the whole data, returning a single-valued output.
In this article, we will take you through the concept of aggregate functions in DBMS. We will also discuss the various types of aggregate functions while giving extensive examples to learn how such functions are applied in real-world situations. Apart from this, we will see the important points to remember while using these functions, and challenges and considerations for the aggregate functions in DBMS.
What are Aggregate Functions?
The aggregate functions are the special functions in SQL (Structured Query Language) that simultaneously perform different operations on multiple rows and return a result. Instead of returning a single result row based on individual rows, aggregate methods return groups of rows. These features let us meaningfully summarise huge datasets, which is why they are crucial for data analysis and reporting. Several frequently used aggregate functions are COUNT(), AVG(), SUM(), MAX(), MIN(), and others.
Having clauses, ORDER BY clauses, and select lists can all contain aggregate functions in databases like SQL, MySQL, and Oracle. When the database separates the rows of a queried table or view into groups, they are frequently utilised in conjunction with the GROUP BY clause in a SELECT statement. The members of the select list in a query with a GROUP BY clause may be constants, GROUP BY expressions, aggregate functions, or expressions involving any of these. The databases deliver a single result row for each set of rows after applying the aggregate functions to each group of rows.
Get curriculum highlights, career paths, industry insights and accelerate your technology journey.
Download brochure
Type of Aggregate Functions in DBMS
There are mainly 5 common aggregate functions in DBMS. There are other aggregate functions also that are used in various SQL databases like MySQL, Oracle, etc. We will cover them all in detail with various examples.
To understand the different aggregate functions in DBMS, we will be using the below sample table data. The table EMP_INFO below consists of various columns like EMP_NAME, EMP_ID, EMP_SALARY, and EMP_DEPT. We have sample data of 15 employees working in the EMP_INFO table.
EMP_ID
EMP_NAME
EMP_SALARY
EMP_DEPT
110
JACK
80000
Marketing
111
RYAN
90000
Design
112
MANE
50000
Design
113
KAMAL
77000
Technology
114
LARA
55000
Technology
115
TABIN
10000
Design
116
BOB
55000
Design
117
HENRY
60500
Accounts
118
MARK
60300
Accounts
119
AARON
70000
Technology
120
RAMAN
80000
Accounts
121
UMAR
25000
Technology
122
RIAZ
56000
Technology
123
RICKEY
30000
Design
124
JONAS
70000
Accounts
Let’s now see the different aggregate functions in DBMS:
1. COUNT() Function
It is used to find the number of rows in a table or result set.
Syntax:
SELECT COUNT(*) FROM table_name;
The entire number of rows in a given table is returned by COUNT(*).
SELECT COUNT(column_name) FROM table_name;
The given syntax will return the number of non-null values in the column passed as a parameter to the function COUNT(column_name).
Example 1: Query counting the number of employees from the given table.
SELECT COUNT(*) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
15
Example 2: Query to count the number of employees in the Technology department.
SELECT COUNT(*) AS Design_Employees FROM Employees WHERE EMP_DEPT = 'Technology';
Output:
EXP_OUTPUT
5
Example 3: Query counting those number of employees having a salary greater than 70000 from the given table.
SELECT COUNT(*) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY >= 70000;
Output:
EXP_OUTPUT
6
2. AVG() Function
It is used to calculate the average of a column’s data. For example, calculating the average wages, average grades, average sales, etc.
Syntax:
SELECT AVG(column_name) FROM table_name;
Example 1: A query to find the mean pay for the employees listed in the table.
SELECT AVG(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
61386.67
Example 2: Using the above table, a query to determine the average salary of the accounts department staff.
SELECT AVG(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';
Output:
EXP_OUTPUT
67700
Example 3: A query to ascertain the mean pay of workers earning over 50,000.
SELECT AVG(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY > 50000;
Output:
EXP_OUTPUT
70875
3. SUM() Function
It is used to find the sum or total of all non-NULL values in that column in DBMS.
Syntax:
SELECT SUM(column_name) FROM table_name WHERE condition;
Example 1: Query to calculate the average salary of employees having a salary less than 50000.
SELECT SUM(EMP_SALARY) AS Total_Salary_Below_50K FROM Employees WHERE EMP_SALARY < 50000;
Output:
EXP_OUTPUT
65000
Example 2: Query to calculate the sum of the total salary of all employees.
SELECT SUM(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
920800
Example 3: Query to calculate the total salary of employees in the Technology department.
SELECT SUM(EMP_SALARY) AS EXP_OUTPUT FROM Employees
where EMP_DEPT = 'Technology';
Output:
EXP_OUTPUT
333000
4. MAX() Function
It is used to find the maximum value from a given column set. In addition, the MAX() function returns NULL if no row is chosen.
Syntax:
SELECT MAX(column_name) FROM table_name;
Example 1: Query to calculate the maximum salary among all company employees.
SELECT MAX(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
90000
Example 2: A query to determine the highest pay in the accounts department of the organisation.
SELECT MAX(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';
Output:
EXP_OUTPUT
80000
Example 3: A query to determine the highest pay among workers making less than 80,000 per year.
SELECT MAX(EMP_SALARY) AS EXP_OUTPUT FROM Employees
WHERE EMP_SALARY < 80000;
Output:
EXP_OUTPUT
77000
5. MIN() Function
The lowest value in a numerical column is returned by the MIN() method. It is employed to ascertain the lowest value – be it the lowest order amount, the lowest income, or the lowest score – within a set of records. In addition, the MIN() function returns NULL if no row is chosen.
Syntax:
SELECT MIN(column_name) FROM table_name;
Example 1: A query to determine each employee’s minimum pay.
SELECT MIN(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
10000
Example 2: A query to determine the Technology department’s minimum pay.
SELECT MIN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Technology';
Output:
EXP_OUTPUT
25000
Example 3: A query to determine the accounting department’s minimum pay.
SELECT MIN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';
Output:
EXP_OUTPUT
60300
6. APPROX_COUNT_DISTINCT() Function
The APPROX_COUNT_DISTINCT() method is used to estimate the number of different values in a column for huge datasets.
Syntax:
SELECT APPROX_COUNT_DISTINCT(column_name) FROM table_name;
Example 1: Query to approximate the count of distinct salaries.
SELECT APPROX_COUNT_DISTINCT(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
13
Example 2: A query to calculate or determine how many different wages there are in the accounts department.
SELECT APPROX_COUNT_DISTINCT(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';
Output:
EXP_OUTPUT
4
Example 3: Query to count the number of distinct salaries in the design department.
SELECT APPROX_COUNT_DISTINCT(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Design';
Output:
EXP_OUTPUT
6
7. GROUPING() Function
To determine whether a column in a GROUP BY query is aggregated or not, use the GROUPING() function.
Syntax:
GROUPING(column_name)
Example 1: Query to determine which rows make up a group’s aggregate BY using department groupings in the query.
SELECT EMP_DEPT, SUM(EMP_SALARY) AS Total_Salary, GROUPING(EMP_DEPT) AS Is_Aggregated FROM Employees GROUP BY ROLLUP(EMP_DEPT);
Output:
EMP_DEPT
Total_Salary
Is_Aggregated
Accounts
184800
0
Design
291000
0
Technology
333000
0
NULL
808800
1
Example 2: Query to determine which rows are aggregated when grouping by department and checking for nulls.
SELECT EMP_DEPT, COUNT(*), GROUPING(EMP_DEPT) AS EXP_OUTPUT FROM Employees GROUP BY CUBE(EMP_DEPT);
Output:
EMP_DEPT
COUNT(*)
EXP_OUTPUT
Accounts
3
0
Design
6
0
Technology
6
0
NULL
15
1
8. MEDIAN() Function
The MEDIAN is an inverse distribution function. When given a numeric value, it returns either the middle value or an interpolated value that, when the values are sorted, would be the middle value. It does not take NULL values. Any numeric data type or any non-numeric data type that can be implicitly transformed into a numeric data type is accepted as an argument for this function. The function returns the same data type as the argument’s numeric data type if you just specify expr.
Syntax:
SELECT MEDIAN(column_name) FROM table_name;
Example 1: A query to find the median pay for every employee in the organisation.
SELECT MEDIAN(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
60500
Example 2: A query to get the design department’s median compensation inside the organisation.
SELECT MEDIAN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Design';
Output:
EXP_OUTPUT
48500
Example 3: A query that determines the median pay for workers making more than 30,000.
SELECT MEDIAN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY > 30000;
Output:
EXP_OUTPUT
60300
9. VAR_POP() Function
The VAR_POP() function calculates the population variance of a numeric column.
Syntax:
SELECT VAR_POP(column_name) FROM table_name;
Example 1: Query to calculate the population variance of all employee salaries
SELECT VAR_POP(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
868925000
Example 2: Query to calculate the population variance of salaries within the Accounts department.
SELECT VAR_POP(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';
Output:
EXP_OUTPUT
18802222.22
Example 3: Query to calculate the population variance of salaries below 60000.
SELECT VAR_POP(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY < 60000;
Output:
EXP_OUTPUT
304000000
10. VARIANCE() Function
The variance of expr is returned by VARIANCE. It can be applied as an analytical or aggregate function.
Syntax:
SELECT VARIANCE(column_name) FROM table_name;
Example 1: Query to find the variance of all employee salaries.
SELECT VARIANCE(EMP_SALARY) AS EXP_OUTPUT FROM Employees;
Output:
EXP_OUTPUT
868925000
Example 2: Query to find the variance of salaries in the company’s technology department.
SELECT VARIANCE(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Technology';
Output:
EXP_OUTPUT
545000000
Example 3: Query to find the variance of salaries for employees who are getting salaries greater than 50000.
SELECT VARIANCE(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY > 50000;
Output:
EXP_OUTPUT
84161600
Important Points to Remember
Some important points to be kept in mind while working with Aggregate functions in DBMS are:
1. Using UNIQUE & DISTINCT clauses
Goal: To guarantee that an aggregate function takes into account only distinct values of the parameter expression, both the DISTINCT and UNIQUE clauses are employed. These two clauses are interchangeable.
2. Using ALL Clause
Goal: The ALL clause causes an aggregate function to consider all values, including duplicates. For example, Using ALL in a query will result in all values being included in the calculation, unlike DISTINCT, which filters out duplicates.
3. Windowing Clause
Goal: Some aggregate functions allow the use of a windowing_clause, which is part of the syntax of analytic functions. This clause is useful for performing calculations over a specified range of rows. For example, the windowing_clause can be used to calculate running totals, moving averages, or cumulative sums.
4. Using GROUP BY Clause
Goal: A result is grouped into subgroups using a GROUP BY clause if one or more columns contain matching values. No two rows in a group have the same value for the column or columns used for grouping. NULL values are regarded as equal when it comes to grouping. Generally, an aggregate expression is used alongside a GROUP BY clause.
5. Performance Impact
Goal: Aggregate functions can be resource-intensive, especially on large datasets. It’s important to optimise queries, possibly using indexing and limiting data scope, to enhance performance. For example, calculating the sum of sales for millions of records may require indexing on the relevant columns to speed up the query.
6. Using a HAVING Clause
Goal: The results of a GROUP BY in a SelectExpression are constrained by a HAVING clause. Similar to how a WHERE clause is applied to a select list, the HAVING clause is applied to each group in the grouped table. The HAVING clause is applied to the entire result as a single group in the absence of a GROUP BY clause.
Why use Aggregate Functions?
Aggregate functions play a pivotal role in DBMS. Below are some of the key reasons why an aggregate function should be used by a database engineer:
Aggregate functions allow the summarization of big datasets down to a single value. This facilitates understanding and the analysis of trends, patterns, and totals. For instance, use SUM() to find total sales for a month or AVG() to calculate the average salary within a department.
Instead of returning data explicitly and then doing the computation on it in our code by fetching so much data, we can let aggregate functions do it for us directly in the database. This helps in that it reduces the need for complex application-level logic. For example, COUNT() might determine how many clients did shop last year without any need to export all data and process separately.
Aggregate functions are not just executed optimally by the database engine (especially when used with indexing and optimised query plans) but also make the resultant query run faster as opposed to manual row-by-row computations. For example, for large datasets, determining the maximum or minimum by use of MAX() or MIN() is faster and demands fewer resources than iterating through the data yourself.
Aggregate functions and the GROUP BY clause work by summarising grouped data after values have been placed into the groups by the GROUP BY clause. When used with the GROUP BY clause, aggregate functions let you take a set of data, group it by specific categories, and then perform calculations on each group. It is great for reporting and cutting the data.
Aggregate functions are important in developing reports that will show some key business insights by the use of aggregate or averages, maximums, and minimums, such as totals which are very important in decision-making. For example, generating a report to indicate average revenue for every customer, total order numbers, or top-selling products.
Aggregate functions can be used in data validation by guaranteeing uniformity and wholeness across the dataset. Like, the use of COUNT() to confirm the number of records or SUM() to ascertain whether total balances correspond to expected amounts. For example, aggregates in a financial system that ensure the books are balanced is that total credits equal total debits.
Applications of Aggregate Functions in DBMS
There are various applications of aggregate functions in DBMS, where the use of aggregate functions will provide you with exceptional results. Here are various real-world applications of aggregate functions in DBMS:
1. Data Summarization and Reporting
Aggregation functions are used in various applications to summarise big datasets by providing insights into the data through the calculation of totals, averages, maximums, minimum values, etc. For instance, in business intelligence reports, aggregate functions can find total sales, average revenue, or the highest sales for a specific region. For example: The SUM() function is used by a company to find total sales revenue for the year across all branches.
2. Data Analytics
Aggregation functions are a means by which data analysts summarise large volumes of raw data into useful statistics. For instance, computing the average score of customer satisfaction aids a firm in knowing how effective its customer service is. For example: An e-commerce company could use the AVG() function to find the average order value. This could help in setting prices or running promotions more effectively.
3. Financial Reporting
In finance, aggregation functions are applied to compute key figures such as total assets, total liabilities, net income, and more. They assist in the generation of financial statements and performance reports. For example, a bank uses the SUM() function to find out the total amount of deposits across all customer accounts or find the average loan amount disbursed with AVG().
4. Inventory Management
Aggregation functions aid in the computation of totals, minimums, and maximums. This is how it helps keep track of inventory levels, which further helps in maintaining optimal stock levels and identifying slow-moving items. For example, for a retailer to determine which item has the lowest stock quantity and thus needs refilling, he may use the MIN() function.
5. Customer Segmentation
Aggregation functions are what businesses use to segment their customers based on such criteria as average purchase value or total spending, which helps target marketing efforts and enhance customer retention.
6. Quality Control
Aggregation functions find applications in quality control where the average defect rate or maximum defect count needs to be calculated, thereby helping companies maintain their standards of quality for the produced products. For example, An AVG() function could find usability by a manufacturing company to compute the average number of defects per batch production.
7. Healthcare
In healthcare, aggregation functions are used in the analysis of patient data by computing the average length of hospital stays, total costs incurred, and most common treatments. For example, the MEDIAN() function could be employed by a hospital to determine the median duration of hospitalisation, thus providing insight into efficiency regarding patient care.
8. Performance Monitoring
Aggregation functions aid in performance monitoring different entities (including employees, departments, or machines) through the calculation of averages, totals, and other such metrics.
Challenges and Considerations
While Aggregate functions provide a powerful way to analyse a large set of data, they also come with various challenges that need to be considered:
Performance Impact
Resource consumption by aggregate functions, especially with large datasets, can be considerable. Performance issues can be alleviated through the optimization of queries – via indexing, using efficient groupings, and limiting the data being processed.
NULL Values Handling
NULL values are handled by aggregate functions in different ways. For example, COUNT(column_name) disregards NULL values, but COUNT(*) counts all rows, even those that have NULL values. One should be well aware of how each aggregate function treats NULL to prevent surprise results.
Grouping and Aggregation
Be very careful when specifying the grouping columns in a GROUP BY clause. Mistaken groups will return invalid outcomes. Also, when using aggregate functions and non-aggregate columns, be sure to include these columns in the GROUP BY clause.
Conclusion
The use of aggregate functions is essential for database administrators or engineers. They enable effective data summarization and analysis, offering insightful information about big datasets. In this article, we have covered what are aggregate functions in DBMS, types of aggregate functions, and more. The different types of DBMS aggregate functions, such as SUM(), AVG(), COUNT(), MAX(), MIN(), and COUNT(DISTINCT), were examined in this article. We also talked about practical applications, the importance of using aggregate functions, and some of the challenges with aggregate functions.
Your ability to analyse data can be greatly improved by comprehending and using aggregate functions, regardless of what business you work in – finance, retail, HR, healthcare, or any other. Now that you have this knowledge, you may use aggregate functions in your database queries to get valuable insights from your data.
FAQs
What are the 5 aggregate functions in DBMS?
The 5 aggregate functions in DBMS include SUM(), AVG(), COUNT(), MIN(), and MAX(). The SUM() calculates the total sum of a column, the AVG() calculates the average value of a column data, the COUNT() counts the number of rows that match a specified condition, the MIN() finds the minimum value from a given column, and lastly MAX() finds the maximum value from a given column.
What are aggregate functions in SQL?
A function that calculates a single value after applying a set of values is known as an aggregate function. These functions are employed in the summarization and analysis of data, including the determination of counts, averages, totals, and the lowest and maximum values within a dataset.
What sets DISTINCT apart from ALL in aggregate functions?
When it comes to aggregate functions, DISTINCT and ALL serve as modifiers that impact how duplicate values are handled. DISTINCT ensures that unique values are factored into the calculation without any duplicates. On the other hand ALL, which is the default behaviour, includes all values, including duplicates.
Is it possible to use aggregate functions alongside the GROUP BY clause?
Absolutely! Aggregate functions are commonly paired with the GROUP BY clause to perform calculations on grouped data. For example, you could group sales data based on regions. Employ SUM() to determine each region's sales figure. The GROUP BY clause organises rows with matching values in specified columns. Allows aggregate functions to provide a value for each group.
How does the KEEP clause work within aggregate functions and what is its purpose?
The KEEP clause enables functions (such as MIN, MAX, SUM, AVG, COUNT, and VARIANCE) to operate on a subset of values that rank as either FIRST or LAST according to a designated sorting order. When you need to find the value in the final rows of a sorted dataset it can be helpful to use an aggregate function, like MIN with KEEP. For example, applying MIN with KEEP allows you to identify the value, from the row in a sorted collection.
Hero Vired is a leading LearnTech company dedicated to offering cutting-edge programs in collaboration with top-tier global institutions. As part of the esteemed Hero Group, we are committed to revolutionizing the skill development landscape in India. Our programs, delivered by industry experts, are designed to empower professionals and students with the skills they need to thrive in today’s competitive job market.