Concurrency control is a mechanism that ensures that concurrent transactions are executed without any data loss or data inconsistency in DBMS. Concurrency control is a significant aspect of the multi-user environment where multiple users are accessing or modifying data or information simultaneously.
In the absence of efficient concurrency control, databases may experience problems like missing updates, data or information inconsistencies, and uncommitted data. In this comprehensive article, we will see the importance and useful application of concurrency control in database management systems along with more concepts, like transaction, concurrent execution, its techniques to control concurrency, and much more.
What is Concurrency Control?
Concurrency control is a process of managing and executing simultaneous transactions or manipulation of data by multiple processes or by users without data loss, data integrity, and data inconsistency. In a multi-user database environment, where various users are accessing and modifying the database and executing or manipulating the transitions, it is necessary to control the concurrency.
Concurrency control provides the way through which the users can execute the simultaneous operations in DBMS. In DBMS, there are two types of operations mainly for a transaction: read (R) and write (W). We will cover each of these operations in detail next.
The main goal of concurrency control is to ensure that the transactions that are being executed in DBMS do not result in data inconsistency. Serializability is the concept used for achieving this goal of concurrency control. Concurrent execution of transactions can lead to multiple issues, including:
- In-consistent Data: When multiple users are accessing and modifying the data in a transaction, it may lead to inconsistent data states. For example, if there is inadequate concurrency management and two transactions are modifying the same record at the same time, the final state of the data may only reflect one of the updates, which could cause inconsistent results.
- Lost Updates: When two or more transactions read the same piece of data and then update it according to the real value, this happens. The final value represents only the most recent update, so “losing” the earlier updates since no transaction can observe the updates performed by prior transactions.
- Uncommitted Data: When a transaction accesses data that has been altered by another transaction but has not yet been committed, it can lead to uncommitted data issues, also known as dirty reads. The data that was read by the first transaction becomes invalid if the transaction that made the modifications is rolled back. A second transaction might be working with uncommitted data that could be rolled back, for instance, if it accesses a record that has been updated by the first transaction before it commits.
- Inconsistent Retrievals: When a transaction reads the same data again and receives different results each time due to another transaction changing the data in between the reads, this is known as an inconsistent retrieval or non-repeatable read. The operation of the transaction may become an anomaly as a result.
For example, if a transaction reads a collection of data, another transaction modifies that data, and then the first transaction reads the data again and observes different values, and this causes an inconsistent retrieval.
What is a Transaction?
A Transaction in DBMS is a collection of operations (like read or write) that is used to perform a logical function. The consistency and integrity of the database are guaranteed by transactions. Maintaining the ACID (Atomicity, Consistency, Isolation, Durability) properties-which ensure that the database stays in an accurate state even in the face of concurrent access and failures—requires transactions.
A program that accesses or alters data in a database is called a transaction. The main functions of transactions are Read (R) and Write (W). Applications and database systems have widely used the notion of a transaction. Although transactions were first employed in financial applications, the idea is currently applied in real-time telecommunication applications and the administration of lengthy tasks like product design or administrative operations.
What are Transaction states?
A transaction has different states in DBMS:
- Active: The state in which the transaction is currently being executed.
- Partially Committed: The state in which the transaction has completed its execution but is not yet committed.
- Committed: The state in which the transaction has been completed, and its changes are now permanently saved to the database.
- Failed: The state in which the transaction has experienced an error or failure and cannot proceed further.
- Aborted: The state in which the transaction has been rolled back due to a failed state, undoing any changes it made to the database.
What is Concurrency?
Concurrency is a problem that occurs in transactions when multiple users try to perform operations like reading or writing the data or information. Concurrency arises in the case when the user accesses or modifies data that is in a consistent state but after doing some operations, it leads to an inconsistent state of data.
Concurrent Execution in DBMS
Concurrent execution in DBMS is a process of executing multiple transactions simultaneously i.e., at the same time. In a multi-user environment scenario, multiple users are accessing the same data in the database and performing different operations on it either to access the data or to modify the data. Concurrent execution of transactions performs operations either as read or write, and executing the transactions concurrently sometimes builds the database to an inconsistent state.
However, to guarantee data integrity and prevent conflicts, managing concurrent execution requires careful thought. By leveraging appropriate concurrency management in DBMS, concurrent execution may be effectively handled, preserving data consistency, integrity, and isolation while optimising system performance and concurrency.
Get curriculum highlights, career paths, industry insights and accelerate your technology journey.
Download brochure
Principles of Concurrency Control
The principles of concurrency control are the ones by which concurrency is built, and these include:
- Atomicity (A)- Transactions are atomic, meaning they are either complete in their entirety or not at all.
- Consistency (C)- The database must transition from one consistent state to another, preserving data integrity.
- Isolation (I)- Transactions should appear as though they are executed in isolation, meaning the operations of one transaction are not visible to other transactions until they are completed.
- Durability (D)- Once a transaction is committed, its changes are permanent, even in the event of a system failure.
Why use the Concurrency Method?
Concurrency is a significant method that is used in controlling the simultaneous execution of transactions to avoid data inconsistency. But there are several factors that you should consider while using concurrency control in DBMS:
- Data Inconsistency- By preventing conflicts between many transactions, concurrency control helps to avoid inconsistent data. Transactions might read and write data inconsistently, leaving the database in an inconsistent state, therefore, for proper control over concurrency, use the concurrency control methods.
- Prevent Dirty Reads- When a transaction reads data that has been altered by an uncommitted transaction, this is known as a dirty read. In this, the data that was read by the first transaction is no longer valid if the modifying transaction is rolled back. By guaranteeing that a transaction can only read data that has been committed, concurrency control helps to avoid dirty reads in the transactions.
- Performance and Throughput- A DBMS’s performance and throughput are enhanced by concurrency control, which permits several transactions to run concurrently. These transactions execute simultaneously using a strict control method. The DBMS can effectively use system resources, decreasing wait times and raising overall throughput, by controlling concurrent access to data.
- Sharing of Data- Data sharing between numerous users and applications is made possible by concurrency control, which protects the data’s consistency and integrity. Users must be able to access and alter data simultaneously in a multi-user environment without triggering disputes.
- Atomicity Protection- A transaction is considered as a single unit and concurrency control ensures that it remains atomic throughout the state. To preserve atomicity, concurrency control makes sure that either every operation of the transaction is completed (succeeded) or aborted (failed).
- Conflict Avoidance- When two or more users are updating a single piece of data, the concurrency control prevents conflicts between transactions to not overwrite any data.
Also Read: DBMS Architecture
Concurrency Control Problems
Different problems may arise when the concurrent execution of transactions is being done in an environment. When proper concurrent control methods are not used, it may lead to data inconsistency in the database. Below are some of the common problems in concurrency control:
1. Phantom Read Problem
A phantom read problem is also known as a dirty-read problem. It is a common problem that occurs when a transaction tries to read the set of rows that satisfy a specific condition. But at the same time, another transaction occurs to perform some operation on the rows that affect the result set before the first transaction completes. As a result, there is a difference or inconsistency in the output of the first transaction when it re-executes the same query.
In simpler words, A dirty read problem occurs when one transaction updates a database item, and then the transaction fails for some reason. The updated database item exists by another transaction before changing back to the original value.
For example, consider a schedule S in which transaction T2 rollbacks after the read (A, a) operation of transaction T4. Now, T3 and T4 have the read values of a which were never committed. So, they perform the dirty read operation. This is so because transactions T3 and T4 read the value of a modified transaction T2. Since the transaction T2 fails and rollback it means that T2 obtains the original value of A and cancels the modified value of A. However, other transactions like T3 and T4 process the modified value of a and result in an inconsistent date in the database.
Schedule (S):
T2 |
T3 |
T4 |
Lock – X (A)
Read (A, a)
Lock – S(B)
Read (B, b)
Write (A, a)
Unlock (A)
|
|
|
Lock – X (A)
Read (A, a)
Write (A, a)
Unlock (A)
|
|
|
Lock – S(A)
Read (A, a)
rollback |
|
|
2. Lost Update Problem
A lost Update problem occurs when two transactions that exceed the same data items have their operations in a way that makes the value of some database item incorrect. This occurs when two or more transactions read the same data and then update it based on the real value. Since each transaction does not see the updates made by the other transactions, the final value reflects only the last update, effectively “losing” the previous updates.
For example, if two transactions read the same account balance and then deduct an amount without properly locking, one of the subtractions may be lost. Consider another example, where a bank account has 5000 as a balance. There are two transactions T1 and T2, first, the transaction T1 reads the balance and gets the result as 5000. Now T2 reads the balance from the account and gets the result as 5000.
T1 now updates the balance by subtracting 2000 from the initial state of account balance. While T2 updates the balance after subtracting 1000. Transaction T2 reads the initial balance of 5000 and updates it to 4000, effectively ignoring the update made by Transaction T1. But we expected the final state as 5000 – 2000 – 1000 = 2000. As a result, the update by T1 is lost.
Techniques for Concurrency Control
Lock-based Protocols
Lock-based protocols use locks to control access to data items. A lock is a mechanism to control concurrent access to a data item. Transactions acquire locks before accessing data and release them after completing their operations.
Types of Locks
- Shared Lock (S): Allows multiple transactions to read a data item but not modify it.
- Exclusive Lock (X): Allows a single transaction to read and modify a data item.
Two-Phase Locking (2PL)
Two-phase locking is a protocol that ensures serializability. It operates in two phases:
- Growing Phase: A transaction may acquire locks but not release any.
- Shrinking Phase: A transaction may release locks but not acquire any new ones.
Timestamp-based Protocols
Timestamp-based protocols assign a unique timestamp to each transaction. Transactions are ordered based on their timestamps, ensuring serializability.
Basic Timestamp Ordering
Each transaction is assigned a unique timestamp. The protocol ensures that conflicting operations are executed in timestamp order. If a transaction tries to perform an operation that violates the order, it is rolled back and restarted with a new timestamp.
Thomas’s Write Rule
An optimization to basic timestamp ordering, Thomas’s Write Rule allows certain operations to be ignored if they do not affect the outcome, reducing unnecessary rollbacks.
Optimistic Concurrency Control
Optimistic concurrency control expects that conflicts are rare. Before being checked for conflicts at the commit point, transactions can proceed without limitations.
Phases of Optimistic Concurrency Control
- Read Phase: Transactions execute without restrictions and read data items.
- Validation Phase: Transactions are checked for conflicts before being committed.
- Write Phase: If no conflicts are found, the transactions are committed; otherwise, they are rolled back.
The term “remediation steps” refers to the actions or procedures that can be taken to resolve the concurrency control issues that occur in database transactions. These steps provide the benefit of multiple concurrent transactions accessing the same data that don’t interact with one another and cause inconsistent data in the DBMS. For addressing the concurrency control, there are some remediation steps:
- Proper Locking: Applying locking mechanisms like shared lock or exclusive lock can help avoid lost updates and unclean reads by the users.
- Isolation levels: To determine the degree of performance and consistency of the data, isolation levels are used. Isolation levels include Read Uncommitted, Read Committed, Serializable, etc.
- Deadlock Handling: A deadlock is a situation that occurs when multiple transactions wait for an infinite period for one another to release resources. Use deadlock detection and resolution mechanisms to efficiently manage deadlocks.
- Transaction Management: Implement transaction management strategies to assure atomicity and durability.
Advantages of Concurrency
Concurrency means the execution of multiple transitions concurrently, which can significantly better utilise resources, increase throughput, improve user experience, and bring several benefits. Some of the advantages of concurrency are:
- Better utilisation of resources like CPU, main memory, etc. When concurrent transactions are executed, the DBMS ensures that the processors are not idle, and utilising every resource of the system leads to more efficient operation.
- Transactions that use concurrency don’t have to wait for other transactions to finish before beginning. Users may see quicker response times as a result of this decreasing the total transaction waiting time.
- The ability to handle numerous transactions at once thanks to concurrency can greatly boost the database system’s throughput. Through concurrent operation, the system may do more tasks in a shorter amount of time.
- By improving the DBMS’s scalability, concurrency makes it possible for it to manage more concurrent users and transactions. Applications that must scale to meet increasing needs must consider this.
- Response time is the amount of time lost waiting for the CPU to respond for the first time. Consequently, concurrency reduces response time.
Disadvantages of Concurrency
- The DBMS becomes significantly more difficult with concurrency control. Sophisticated algorithms and methods are needed in the system to guarantee that transactions are carried out in a way that preserves data consistency and integrity.
- When two or more transactions are waiting for one another to release resources, it might cause deadlocks, which prevent any of the operations from moving forward. Deadlock detection and resolution demand more complexity and costs.
- Concurrency can increase system performance overall, but it can also cause resource conflicts and contention, which can lower performance. The overhead of maintaining concurrent access may outweigh the advantages in highly contested contexts.
- Concurrency control implementation can be challenging, especially in remote systems or those with complex transactional logic.
Also Read: DBMS Interview Questions with Answers
Conclusion
Concurrency control is a process of executing multiple transactions at the same time without any data loss or data inconsistency. In this article, we have gone through the detailed guide to concurrency control. We have learned what is a transaction, transaction states, and concurrent execution in database management systems. Principles of concurrency control and the importance of using concurrency methods were also discussed.
To control the concurrency in the transactions, we have seen different techniques like lock-based protocols, time-stamp-based protocols, and optimistic concurrency control protocols. Concurrency control guarantees that the database runs smoothly and accurately by preventing data inconsistency, avoiding dirty reads, optimising efficiency, promoting data sharing, protecting atomicity, and avoiding conflicts.
FAQs
Concurrency control is a mechanism where multiple transactions are executed at the same time without any data loss or data inconsistency in DBMS. It prevents conflicts and anomalies that can arise when various transactions access and modify the database parallelly.
For concurrency control in DBMS, there are various methods in the database like lock-based protocols, timestamp-based protocols, and optimistic concurrency control protocols.
Concurrency is a problem that arises when multiple users access or modify a specific data that is in a consistent state but after doing some operations, it leads to an inconsistent state of data.
Improper use of locks, lock over-usage, and isolation level mismatch are some common mistakes that database developers commit while they integrate concurrency control into DBMS. However, steering clear of these mistakes can assist developers in guaranteeing the protection of their database against any issues related to concurrency.
Concurrency control holds significance as it ensures the execution of transactions in a manner that upholds the database's correctness. Inconsistent discrepancies may include various anomalies like lost updates (or dirty reads) when two or more parallel operations are missing at both ends.