Aggregate Functions in DBMS – With Examples

Updated on September 2, 2024

Article Outline

Since database management systems offer a reliable, scalable, and secure method of handling data, they are, in one way or another, at the core of all contemporary applications. One of the core functionalities of DBMS is Aggregate functions. These functions allow us to summarise and analyse data efficiently. Aggregate functions are important in these operations because they will enable us to compute a single result from a set of multiple values; thus, assisting the user in analysing and extracting meaningful information from large datasets. The calculations performed by Aggregate Functions in DBMS are over the whole data, returning a single-valued output.

 

In this article, we will take you through the concept of aggregate functions in DBMS. We will also discuss the various types of aggregate functions while giving extensive examples to learn how such functions are applied in real-world situations. Apart from this, we will see the important points to remember while using these functions, and challenges and considerations for the aggregate functions in DBMS.

What are Aggregate Functions?

The aggregate functions are the special functions in SQL (Structured Query Language) that simultaneously perform different operations on multiple rows and return a result. Instead of returning a single result row based on individual rows, aggregate methods return groups of rows. These features let us meaningfully summarise huge datasets, which is why they are crucial for data analysis and reporting. Several frequently used aggregate functions are COUNT(), AVG(), SUM(), MAX(), MIN(), and others.

 

Having clauses, ORDER BY clauses, and select lists can all contain aggregate functions in databases like SQL, MySQL, and Oracle. When the database separates the rows of a queried table or view into groups, they are frequently utilised in conjunction with the GROUP BY clause in a SELECT statement. The members of the select list in a query with a GROUP BY clause may be constants, GROUP BY expressions, aggregate functions, or expressions involving any of these. The databases deliver a single result row for each set of rows after applying the aggregate functions to each group of rows.

*Image
Get curriculum highlights, career paths, industry insights and accelerate your technology journey.
Download brochure

Type of Aggregate Functions in DBMS

There are mainly 5 common aggregate functions in DBMS. There are other aggregate functions also that are used in various SQL databases like MySQL, Oracle, etc. We will cover them all in detail with various examples.

 

To understand the different aggregate functions in DBMS, we will be using the below sample table data. The table EMP_INFO below consists of various columns like EMP_NAME, EMP_ID, EMP_SALARY, and EMP_DEPT. We have sample data of 15 employees working in the EMP_INFO table.

 

EMP_ID EMP_NAME EMP_SALARY EMP_DEPT
110 JACK 80000 Marketing
111 RYAN 90000 Design
112 MANE 50000 Design
113 KAMAL 77000 Technology
114 LARA 55000 Technology
115 TABIN 10000 Design
116 BOB 55000 Design
117 HENRY 60500 Accounts
118 MARK 60300 Accounts
119 AARON 70000 Technology
120 RAMAN 80000 Accounts
121 UMAR 25000 Technology
122 RIAZ 56000 Technology
123 RICKEY 30000 Design
124 JONAS 70000 Accounts

 

Let’s now see the different aggregate functions in DBMS:

1. COUNT() Function

It is used to find the number of rows in a table or result set.

 

Syntax:

 

  • SELECT COUNT(*) FROM table_name;

 

The entire number of rows in a given table is returned by COUNT(*).

 

  • SELECT COUNT(column_name) FROM table_name;

 

The given syntax will return the number of non-null values in the column passed as a parameter to the function COUNT(column_name).

 

Example 1: Query counting the number of employees from the given table.

SELECT COUNT(*) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 15

Example 2: Query to count the number of employees in the Technology department.

SELECT COUNT(*) AS Design_Employees FROM Employees WHERE EMP_DEPT = 'Technology';

Output:

EXP_OUTPUT 5

Example 3: Query counting those number of employees having a salary greater than 70000 from the given table.

SELECT COUNT(*) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY >= 70000;

Output:

EXP_OUTPUT 6

2. AVG() Function

It is used to calculate the average of a column’s data. For example, calculating the average wages, average grades, average sales, etc.

 

Syntax:

 

SELECT AVG(column_name) FROM table_name;

Example 1: A query to find the mean pay for the employees listed in the table.

SELECT AVG(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 61386.67

Example 2: Using the above table, a query to determine the average salary of the accounts department staff.

SELECT AVG(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';

Output:

EXP_OUTPUT 67700

Example 3: A query to ascertain the mean pay of workers earning over 50,000.

SELECT AVG(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY > 50000;

Output:

EXP_OUTPUT 70875

3. SUM() Function

It is used to find the sum or total of all non-NULL values in that column in DBMS.

 

Syntax:

 

SELECT SUM(column_name) FROM table_name WHERE condition;

 

Example 1: Query to calculate the average salary of employees having a salary less than 50000.

SELECT SUM(EMP_SALARY) AS Total_Salary_Below_50K FROM Employees WHERE EMP_SALARY < 50000;

Output:

EXP_OUTPUT 65000

Example 2: Query to calculate the sum of the total salary of all employees.

SELECT SUM(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 920800

Example 3: Query to calculate the total salary of employees in the Technology department.

SELECT SUM(EMP_SALARY) AS EXP_OUTPUT FROM Employees where EMP_DEPT = 'Technology';

Output:

EXP_OUTPUT 333000

4. MAX() Function

It is used to find the maximum value from a given column set. In addition, the MAX() function returns NULL if no row is chosen.

 

Syntax:

 

SELECT MAX(column_name) FROM table_name;

 

Example 1: Query to calculate the maximum salary among all company employees.

SELECT MAX(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 90000

Example 2: A query to determine the highest pay in the accounts department of the organisation.

SELECT MAX(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';

Output:

EXP_OUTPUT 80000

Example 3: A query to determine the highest pay among workers making less than 80,000 per year.

SELECT MAX(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY < 80000;

Output:

EXP_OUTPUT 77000

5. MIN() Function

The lowest value in a numerical column is returned by the MIN() method. It is employed to ascertain the lowest value – be it the lowest order amount, the lowest income, or the lowest score – within a set of records. In addition, the MIN() function returns NULL if no row is chosen.

 

Syntax:

 

SELECT MIN(column_name) FROM table_name;

 

Example 1: A query to determine each employee’s minimum pay.

SELECT MIN(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 10000

Example 2: A query to determine the Technology department’s minimum pay.

SELECT MIN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Technology';

Output:

EXP_OUTPUT 25000

Example 3: A query to determine the accounting department’s minimum pay.

SELECT MIN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';

Output:

EXP_OUTPUT 60300

6. APPROX_COUNT_DISTINCT() Function

The APPROX_COUNT_DISTINCT() method is used to estimate the number of different values in a column for huge datasets.

 

Syntax:

 

SELECT APPROX_COUNT_DISTINCT(column_name) FROM table_name;

Example 1: Query to approximate the count of distinct salaries.

SELECT APPROX_COUNT_DISTINCT(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 13

Example 2: A query to calculate or determine how many different wages there are in the accounts department.

SELECT APPROX_COUNT_DISTINCT(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';

Output:

EXP_OUTPUT 4

Example 3: Query to count the number of distinct salaries in the design department.

SELECT APPROX_COUNT_DISTINCT(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Design';

Output:

EXP_OUTPUT 6

7. GROUPING() Function

To determine whether a column in a GROUP BY query is aggregated or not, use the GROUPING() function.

 

Syntax:

 

GROUPING(column_name)

 

Example 1: Query to determine which rows make up a group’s aggregate BY using department groupings in the query.

SELECT EMP_DEPT, SUM(EMP_SALARY) AS Total_Salary, GROUPING(EMP_DEPT) AS Is_Aggregated FROM Employees GROUP BY ROLLUP(EMP_DEPT);

Output:

 

EMP_DEPT Total_Salary Is_Aggregated
Accounts 184800 0
Design 291000 0
Technology 333000 0
NULL 808800 1

 

 

Example 2: Query to determine which rows are aggregated when grouping by department and checking for nulls.

SELECT EMP_DEPT, COUNT(*), GROUPING(EMP_DEPT) AS EXP_OUTPUT FROM Employees GROUP BY CUBE(EMP_DEPT);

Output:

 

EMP_DEPT COUNT(*) EXP_OUTPUT
Accounts 3 0
Design 6 0
Technology 6 0
NULL 15 1

 

8. MEDIAN() Function

The MEDIAN is an inverse distribution function. When given a numeric value, it returns either the middle value or an interpolated value that, when the values are sorted, would be the middle value. It does not take NULL values. Any numeric data type or any non-numeric data type that can be implicitly transformed into a numeric data type is accepted as an argument for this function. The function returns the same data type as the argument’s numeric data type if you just specify expr.

 

Syntax:

 

SELECT MEDIAN(column_name) FROM table_name;

 

Example 1: A query to find the median pay for every employee in the organisation.

SELECT MEDIAN(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 60500

Example 2: A query to get the design department’s median compensation inside the organisation.

SELECT MEDIAN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Design';

Output:

EXP_OUTPUT 48500

Example 3: A query that determines the median pay for workers making more than 30,000.

SELECT MEDIAN(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY > 30000;

Output:

EXP_OUTPUT 60300

9. VAR_POP() Function

The VAR_POP() function calculates the population variance of a numeric column.

 

Syntax:

 

SELECT VAR_POP(column_name) FROM table_name;

Example 1: Query to calculate the population variance of all employee salaries

SELECT VAR_POP(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 868925000

Example 2: Query to calculate the population variance of salaries within the Accounts department.

SELECT VAR_POP(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Accounts';

Output:

EXP_OUTPUT 18802222.22

Example 3: Query to calculate the population variance of salaries below 60000.

SELECT VAR_POP(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY < 60000;

Output:

EXP_OUTPUT 304000000

10. VARIANCE() Function

The variance of expr is returned by VARIANCE. It can be applied as an analytical or aggregate function.

 

Syntax:

 

SELECT VARIANCE(column_name) FROM table_name;

Example 1: Query to find the variance of all employee salaries.

SELECT VARIANCE(EMP_SALARY) AS EXP_OUTPUT FROM Employees;

Output:

EXP_OUTPUT 868925000

Example 2: Query to find the variance of salaries in the company’s technology department.

SELECT VARIANCE(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_DEPT = 'Technology';

Output:

EXP_OUTPUT 545000000

Example 3: Query to find the variance of salaries for employees who are getting salaries greater than 50000.

SELECT VARIANCE(EMP_SALARY) AS EXP_OUTPUT FROM Employees WHERE EMP_SALARY > 50000;

Output:

EXP_OUTPUT 84161600

Important Points to Remember

Some important points to be kept in mind while working with Aggregate functions in DBMS are:

 

1. Using UNIQUE & DISTINCT clauses

Goal: To guarantee that an aggregate function takes into account only distinct values of the parameter expression, both the DISTINCT and UNIQUE clauses are employed. These two clauses are interchangeable.

2. Using ALL Clause

Goal: The ALL clause causes an aggregate function to consider all values, including duplicates. For example, Using ALL in a query will result in all values being included in the calculation, unlike DISTINCT, which filters out duplicates.

3. Windowing Clause

Goal: Some aggregate functions allow the use of a windowing_clause, which is part of the syntax of analytic functions. This clause is useful for performing calculations over a specified range of rows. For example, the windowing_clause can be used to calculate running totals, moving averages, or cumulative sums.

4. Using GROUP BY Clause

Goal: A result is grouped into subgroups using a GROUP BY clause if one or more columns contain matching values. No two rows in a group have the same value for the column or columns used for grouping. NULL values are regarded as equal when it comes to grouping. Generally, an aggregate expression is used alongside a GROUP BY clause.

5. Performance Impact

Goal: Aggregate functions can be resource-intensive, especially on large datasets. It’s important to optimise queries, possibly using indexing and limiting data scope, to enhance performance. For example, calculating the sum of sales for millions of records may require indexing on the relevant columns to speed up the query. 

6. Using a HAVING Clause

Goal: The results of a GROUP BY in a SelectExpression are constrained by a HAVING clause. Similar to how a WHERE clause is applied to a select list, the HAVING clause is applied to each group in the grouped table. The HAVING clause is applied to the entire result as a single group in the absence of a GROUP BY clause.

Why use Aggregate Functions?

Aggregate functions play a pivotal role in DBMS. Below are some of the key reasons why an aggregate function should be used by a database engineer:

 

  1. Aggregate functions allow the summarization of big datasets down to a single value. This facilitates understanding and the analysis of trends, patterns, and totals. For instance, use SUM() to find total sales for a month or AVG() to calculate the average salary within a department.
  2. Instead of returning data explicitly and then doing the computation on it in our code by fetching so much data, we can let aggregate functions do it for us directly in the database. This helps in that it reduces the need for complex application-level logic. For example, COUNT() might determine how many clients did shop last year without any need to export all data and process separately.
  3. Aggregate functions are not just executed optimally by the database engine (especially when used with indexing and optimised query plans) but also make the resultant query run faster as opposed to manual row-by-row computations. For example, for large datasets, determining the maximum or minimum by use of MAX() or MIN() is faster and demands fewer resources than iterating through the data yourself.
  4. Aggregate functions and the GROUP BY clause work by summarising grouped data after values have been placed into the groups by the GROUP BY clause. When used with the GROUP BY clause, aggregate functions let you take a set of data, group it by specific categories, and then perform calculations on each group. It is great for reporting and cutting the data.
  5. Aggregate functions are important in developing reports that will show some key business insights by the use of aggregate or averages, maximums, and minimums, such as totals which are very important in decision-making. For example, generating a report to indicate average revenue for every customer, total order numbers, or top-selling products.
  6. Aggregate functions can be used in data validation by guaranteeing uniformity and wholeness across the dataset. Like, the use of COUNT() to confirm the number of records or SUM() to ascertain whether total balances correspond to expected amounts. For example, aggregates in a financial system that ensure the books are balanced is that total credits equal total debits.

Applications of Aggregate Functions in DBMS

There are various applications of aggregate functions in DBMS, where the use of aggregate functions will provide you with exceptional results. Here are various real-world applications of aggregate functions in DBMS:

1. Data Summarization and Reporting

Aggregation functions are used in various applications to summarise big datasets by providing insights into the data through the calculation of totals, averages, maximums, minimum values, etc. For instance, in business intelligence reports, aggregate functions can find total sales, average revenue, or the highest sales for a specific region. For example: The SUM() function is used by a company to find total sales revenue for the year across all branches.

2. Data Analytics

Aggregation functions are a means by which data analysts summarise large volumes of raw data into useful statistics. For instance, computing the average score of customer satisfaction aids a firm in knowing how effective its customer service is. For example: An e-commerce company could use the AVG() function to find the average order value. This could help in setting prices or running promotions more effectively.

3. Financial Reporting

In finance, aggregation functions are applied to compute key figures such as total assets, total liabilities, net income, and more. They assist in the generation of financial statements and performance reports. For example, a bank uses the SUM() function to find out the total amount of deposits across all customer accounts or find the average loan amount disbursed with AVG().

4. Inventory Management

Aggregation functions aid in the computation of totals, minimums, and maximums. This is how it helps keep track of inventory levels, which further helps in maintaining optimal stock levels and identifying slow-moving items. For example, for a retailer to determine which item has the lowest stock quantity and thus needs refilling, he may use the MIN() function.

5. Customer Segmentation

Aggregation functions are what businesses use to segment their customers based on such criteria as average purchase value or total spending, which helps target marketing efforts and enhance customer retention.

6. Quality Control

Aggregation functions find applications in quality control where the average defect rate or maximum defect count needs to be calculated, thereby helping companies maintain their standards of quality for the produced products. For example, An AVG() function could find usability by a manufacturing company to compute the average number of defects per batch production.

7. Healthcare

In healthcare, aggregation functions are used in the analysis of patient data by computing the average length of hospital stays, total costs incurred, and most common treatments. For example, the MEDIAN() function could be employed by a hospital to determine the median duration of hospitalisation, thus providing insight into efficiency regarding patient care.

8. Performance Monitoring

Aggregation functions aid in performance monitoring different entities (including employees, departments, or machines) through the calculation of averages, totals, and other such metrics.

Challenges and Considerations

While Aggregate functions provide a powerful way to analyse a large set of data, they also come with various challenges that need to be considered:

 

  • Performance Impact

Resource consumption by aggregate functions, especially with large datasets, can be considerable. Performance issues can be alleviated through the optimization of queries – via indexing, using efficient groupings, and limiting the data being processed.

 

  • NULL Values Handling

NULL values are handled by aggregate functions in different ways. For example, COUNT(column_name) disregards NULL values, but COUNT(*) counts all rows, even those that have NULL values. One should be well aware of how each aggregate function treats NULL to prevent surprise results.

 

  • Grouping and Aggregation

Be very careful when specifying the grouping columns in a GROUP BY clause. Mistaken groups will return invalid outcomes. Also, when using aggregate functions and non-aggregate columns, be sure to include these columns in the GROUP BY clause.

Conclusion

The use of aggregate functions is essential for database administrators or engineers. They enable effective data summarization and analysis, offering insightful information about big datasets. In this article, we have covered what are aggregate functions in DBMS, types of aggregate functions, and more. The different types of DBMS aggregate functions, such as SUM(), AVG(), COUNT(), MAX(), MIN(), and COUNT(DISTINCT), were examined in this article. We also talked about practical applications, the importance of using aggregate functions, and some of the challenges with aggregate functions.

 

Your ability to analyse data can be greatly improved by comprehending and using aggregate functions, regardless of what business you work in – finance, retail, HR, healthcare, or any other. Now that you have this knowledge, you may use aggregate functions in your database queries to get valuable insights from your data.

FAQs
The 5 aggregate functions in DBMS include SUM(), AVG(), COUNT(), MIN(), and MAX(). The SUM() calculates the total sum of a column, the AVG() calculates the average value of a column data, the COUNT() counts the number of rows that match a specified condition, the MIN() finds the minimum value from a given column, and lastly MAX() finds the maximum value from a given column.
A function that calculates a single value after applying a set of values is known as an aggregate function. These functions are employed in the summarization and analysis of data, including the determination of counts, averages, totals, and the lowest and maximum values within a dataset.
When it comes to aggregate functions, DISTINCT and ALL serve as modifiers that impact how duplicate values are handled. DISTINCT ensures that unique values are factored into the calculation without any duplicates. On the other hand ALL, which is the default behaviour, includes all values, including duplicates.
Absolutely! Aggregate functions are commonly paired with the GROUP BY clause to perform calculations on grouped data. For example, you could group sales data based on regions. Employ SUM() to determine each region's sales figure. The GROUP BY clause organises rows with matching values in specified columns. Allows aggregate functions to provide a value for each group.
The KEEP clause enables functions (such as MIN, MAX, SUM, AVG, COUNT, and VARIANCE) to operate on a subset of values that rank as either FIRST or LAST according to a designated sorting order. When you need to find the value in the final rows of a sorted dataset it can be helpful to use an aggregate function, like MIN with KEEP. For example, applying MIN with KEEP allows you to identify the value, from the row in a sorted collection.

Updated on September 2, 2024

Link
left dot patternright dot pattern

Programs tailored for your success

Popular

Management

Data Science

Finance

Technology

Future Tech

Upskill with expert articles

View all
Hero Vired logo
Hero Vired is a leading LearnTech company dedicated to offering cutting-edge programs in collaboration with top-tier global institutions. As part of the esteemed Hero Group, we are committed to revolutionizing the skill development landscape in India. Our programs, delivered by industry experts, are designed to empower professionals and students with the skills they need to thrive in today’s competitive job market.
Blogs
Reviews
Events
In the News
About Us
Contact us
Learning Hub
18003093939     ·     hello@herovired.com     ·    Whatsapp
Privacy policy and Terms of use

|

Sitemap

© 2024 Hero Vired. All rights reserved