Popular
Data Science
Technology
Finance
Management
Future Tech
Since database management systems offer a reliable, scalable, and secure method of handling data, they are, in one way or another, at the core of all contemporary applications. One of the core functionalities of DBMS is Aggregate functions. These functions allow us to summarise and analyse data efficiently. Aggregate functions are important in these operations because they will enable us to compute a single result from a set of multiple values; thus, assisting the user in analysing and extracting meaningful information from large datasets. The calculations performed by Aggregate Functions in DBMS are over the whole data, returning a single-valued output.
In this article, we will take you through the concept of aggregate functions in DBMS. We will also discuss the various types of aggregate functions while giving extensive examples to learn how such functions are applied in real-world situations. Apart from this, we will see the important points to remember while using these functions, and challenges and considerations for the aggregate functions in DBMS.
The aggregate functions are the special functions in SQL (Structured Query Language) that simultaneously perform different operations on multiple rows and return a result. Instead of returning a single result row based on individual rows, aggregate methods return groups of rows. These features let us meaningfully summarise huge datasets, which is why they are crucial for data analysis and reporting. Several frequently used aggregate functions are COUNT(), AVG(), SUM(), MAX(), MIN(), and others.
Having clauses, ORDER BY clauses, and select lists can all contain aggregate functions in databases like SQL, MySQL, and Oracle. When the database separates the rows of a queried table or view into groups, they are frequently utilised in conjunction with the GROUP BY clause in a SELECT statement. The members of the select list in a query with a GROUP BY clause may be constants, GROUP BY expressions, aggregate functions, or expressions involving any of these. The databases deliver a single result row for each set of rows after applying the aggregate functions to each group of rows.
There are mainly 5 common aggregate functions in DBMS. There are other aggregate functions also that are used in various SQL databases like MySQL, Oracle, etc. We will cover them all in detail with various examples.
To understand the different aggregate functions in DBMS, we will be using the below sample table data. The table EMP_INFO below consists of various columns like EMP_NAME, EMP_ID, EMP_SALARY, and EMP_DEPT. We have sample data of 15 employees working in the EMP_INFO table.
EMP_ID | EMP_NAME | EMP_SALARY | EMP_DEPT |
110 | JACK | 80000 | Marketing |
111 | RYAN | 90000 | Design |
112 | MANE | 50000 | Design |
113 | KAMAL | 77000 | Technology |
114 | LARA | 55000 | Technology |
115 | TABIN | 10000 | Design |
116 | BOB | 55000 | Design |
117 | HENRY | 60500 | Accounts |
118 | MARK | 60300 | Accounts |
119 | AARON | 70000 | Technology |
120 | RAMAN | 80000 | Accounts |
121 | UMAR | 25000 | Technology |
122 | RIAZ | 56000 | Technology |
123 | RICKEY | 30000 | Design |
124 | JONAS | 70000 | Accounts |
Let’s now see the different aggregate functions in DBMS:
It is used to find the number of rows in a table or result set.
Syntax:
The entire number of rows in a given table is returned by COUNT(*).
The given syntax will return the number of non-null values in the column passed as a parameter to the function COUNT(column_name).
Example 1: Query counting the number of employees from the given table.
Output:
Example 2: Query to count the number of employees in the Technology department.
Output:
Example 3: Query counting those number of employees having a salary greater than 70000 from the given table.
Output:
It is used to calculate the average of a column’s data. For example, calculating the average wages, average grades, average sales, etc.
Syntax:
SELECT AVG(column_name) FROM table_name;
Example 1: A query to find the mean pay for the employees listed in the table.
Output:
Example 2: Using the above table, a query to determine the average salary of the accounts department staff.
Output:
Example 3: A query to ascertain the mean pay of workers earning over 50,000.
Output:
It is used to find the sum or total of all non-NULL values in that column in DBMS.
Syntax:
SELECT SUM(column_name) FROM table_name WHERE condition;
Example 1: Query to calculate the average salary of employees having a salary less than 50000.
Output:
Example 2: Query to calculate the sum of the total salary of all employees.
Output:
Example 3: Query to calculate the total salary of employees in the Technology department.
Output:
It is used to find the maximum value from a given column set. In addition, the MAX() function returns NULL if no row is chosen.
Syntax:
SELECT MAX(column_name) FROM table_name;
Example 1: Query to calculate the maximum salary among all company employees.
Output:
Example 2: A query to determine the highest pay in the accounts department of the organisation.
Output:
Example 3: A query to determine the highest pay among workers making less than 80,000 per year.
Output:
The lowest value in a numerical column is returned by the MIN() method. It is employed to ascertain the lowest value – be it the lowest order amount, the lowest income, or the lowest score – within a set of records. In addition, the MIN() function returns NULL if no row is chosen.
Syntax:
SELECT MIN(column_name) FROM table_name;
Example 1: A query to determine each employee’s minimum pay.
Output:
Example 2: A query to determine the Technology department’s minimum pay.
Output:
Example 3: A query to determine the accounting department’s minimum pay.
Output:
The APPROX_COUNT_DISTINCT() method is used to estimate the number of different values in a column for huge datasets.
Syntax:
SELECT APPROX_COUNT_DISTINCT(column_name) FROM table_name;
Example 1: Query to approximate the count of distinct salaries.
Output:
Example 2: A query to calculate or determine how many different wages there are in the accounts department.
Output:
Example 3: Query to count the number of distinct salaries in the design department.
Output:
To determine whether a column in a GROUP BY query is aggregated or not, use the GROUPING() function.
Syntax:
GROUPING(column_name)
Example 1: Query to determine which rows make up a group’s aggregate BY using department groupings in the query.
Output:
EMP_DEPT | Total_Salary | Is_Aggregated |
Accounts | 184800 | 0 |
Design | 291000 | 0 |
Technology | 333000 | 0 |
NULL | 808800 | 1 |
Example 2: Query to determine which rows are aggregated when grouping by department and checking for nulls.
Output:
EMP_DEPT | COUNT(*) | EXP_OUTPUT |
Accounts | 3 | 0 |
Design | 6 | 0 |
Technology | 6 | 0 |
NULL | 15 | 1 |
The MEDIAN is an inverse distribution function. When given a numeric value, it returns either the middle value or an interpolated value that, when the values are sorted, would be the middle value. It does not take NULL values. Any numeric data type or any non-numeric data type that can be implicitly transformed into a numeric data type is accepted as an argument for this function. The function returns the same data type as the argument’s numeric data type if you just specify expr.
Syntax:
SELECT MEDIAN(column_name) FROM table_name;
Example 1: A query to find the median pay for every employee in the organisation.
Output:
Example 2: A query to get the design department’s median compensation inside the organisation.
Output:
Example 3: A query that determines the median pay for workers making more than 30,000.
Output:
The VAR_POP() function calculates the population variance of a numeric column.
Syntax:
SELECT VAR_POP(column_name) FROM table_name;
Example 1: Query to calculate the population variance of all employee salaries
Output:
Example 2: Query to calculate the population variance of salaries within the Accounts department.
Output:
Example 3: Query to calculate the population variance of salaries below 60000.
Output:
The variance of expr is returned by VARIANCE. It can be applied as an analytical or aggregate function.
Syntax:
SELECT VARIANCE(column_name) FROM table_name;
Example 1: Query to find the variance of all employee salaries.
Output:
Example 2: Query to find the variance of salaries in the company’s technology department.
Output:
Example 3: Query to find the variance of salaries for employees who are getting salaries greater than 50000.
Output:
Some important points to be kept in mind while working with Aggregate functions in DBMS are:
Goal: To guarantee that an aggregate function takes into account only distinct values of the parameter expression, both the DISTINCT and UNIQUE clauses are employed. These two clauses are interchangeable.
Goal: The ALL clause causes an aggregate function to consider all values, including duplicates. For example, Using ALL in a query will result in all values being included in the calculation, unlike DISTINCT, which filters out duplicates.
Goal: Some aggregate functions allow the use of a windowing_clause, which is part of the syntax of analytic functions. This clause is useful for performing calculations over a specified range of rows. For example, the windowing_clause can be used to calculate running totals, moving averages, or cumulative sums.
Goal: A result is grouped into subgroups using a GROUP BY clause if one or more columns contain matching values. No two rows in a group have the same value for the column or columns used for grouping. NULL values are regarded as equal when it comes to grouping. Generally, an aggregate expression is used alongside a GROUP BY clause.
Goal: Aggregate functions can be resource-intensive, especially on large datasets. It’s important to optimise queries, possibly using indexing and limiting data scope, to enhance performance. For example, calculating the sum of sales for millions of records may require indexing on the relevant columns to speed up the query.
Goal: The results of a GROUP BY in a SelectExpression are constrained by a HAVING clause. Similar to how a WHERE clause is applied to a select list, the HAVING clause is applied to each group in the grouped table. The HAVING clause is applied to the entire result as a single group in the absence of a GROUP BY clause.
Aggregate functions play a pivotal role in DBMS. Below are some of the key reasons why an aggregate function should be used by a database engineer:
There are various applications of aggregate functions in DBMS, where the use of aggregate functions will provide you with exceptional results. Here are various real-world applications of aggregate functions in DBMS:
Aggregation functions are used in various applications to summarise big datasets by providing insights into the data through the calculation of totals, averages, maximums, minimum values, etc. For instance, in business intelligence reports, aggregate functions can find total sales, average revenue, or the highest sales for a specific region. For example: The SUM() function is used by a company to find total sales revenue for the year across all branches.
Aggregation functions are a means by which data analysts summarise large volumes of raw data into useful statistics. For instance, computing the average score of customer satisfaction aids a firm in knowing how effective its customer service is. For example: An e-commerce company could use the AVG() function to find the average order value. This could help in setting prices or running promotions more effectively.
In finance, aggregation functions are applied to compute key figures such as total assets, total liabilities, net income, and more. They assist in the generation of financial statements and performance reports. For example, a bank uses the SUM() function to find out the total amount of deposits across all customer accounts or find the average loan amount disbursed with AVG().
Aggregation functions aid in the computation of totals, minimums, and maximums. This is how it helps keep track of inventory levels, which further helps in maintaining optimal stock levels and identifying slow-moving items. For example, for a retailer to determine which item has the lowest stock quantity and thus needs refilling, he may use the MIN() function.
Aggregation functions are what businesses use to segment their customers based on such criteria as average purchase value or total spending, which helps target marketing efforts and enhance customer retention.
Aggregation functions find applications in quality control where the average defect rate or maximum defect count needs to be calculated, thereby helping companies maintain their standards of quality for the produced products. For example, An AVG() function could find usability by a manufacturing company to compute the average number of defects per batch production.
In healthcare, aggregation functions are used in the analysis of patient data by computing the average length of hospital stays, total costs incurred, and most common treatments. For example, the MEDIAN() function could be employed by a hospital to determine the median duration of hospitalisation, thus providing insight into efficiency regarding patient care.
Aggregation functions aid in performance monitoring different entities (including employees, departments, or machines) through the calculation of averages, totals, and other such metrics.
While Aggregate functions provide a powerful way to analyse a large set of data, they also come with various challenges that need to be considered:
Resource consumption by aggregate functions, especially with large datasets, can be considerable. Performance issues can be alleviated through the optimization of queries – via indexing, using efficient groupings, and limiting the data being processed.
NULL values are handled by aggregate functions in different ways. For example, COUNT(column_name) disregards NULL values, but COUNT(*) counts all rows, even those that have NULL values. One should be well aware of how each aggregate function treats NULL to prevent surprise results.
Be very careful when specifying the grouping columns in a GROUP BY clause. Mistaken groups will return invalid outcomes. Also, when using aggregate functions and non-aggregate columns, be sure to include these columns in the GROUP BY clause.
The use of aggregate functions is essential for database administrators or engineers. They enable effective data summarization and analysis, offering insightful information about big datasets. In this article, we have covered what are aggregate functions in DBMS, types of aggregate functions, and more. The different types of DBMS aggregate functions, such as SUM(), AVG(), COUNT(), MAX(), MIN(), and COUNT(DISTINCT), were examined in this article. We also talked about practical applications, the importance of using aggregate functions, and some of the challenges with aggregate functions.
Your ability to analyse data can be greatly improved by comprehending and using aggregate functions, regardless of what business you work in – finance, retail, HR, healthcare, or any other. Now that you have this knowledge, you may use aggregate functions in your database queries to get valuable insights from your data.
The DevOps Playbook
Simplify deployment with Docker containers.
Streamline development with modern practices.
Enhance efficiency with automated workflows.
Popular
Data Science
Technology
Finance
Management
Future Tech
Accelerator Program in Business Analytics & Data Science
Integrated Program in Data Science, AI and ML
Certificate Program in Full Stack Development with Specialization for Web and Mobile
Certificate Program in DevOps and Cloud Engineering
Certificate Program in Application Development
Certificate Program in Cybersecurity Essentials & Risk Assessment
Integrated Program in Finance and Financial Technologies
Certificate Program in Financial Analysis, Valuation and Risk Management
© 2024 Hero Vired. All rights reserved