Introduction to Concurrency Control in DBMS
Introduction to Concurrency Control in DBMS
In a Database Management System (DBMS), multiple users may access and update the database simultaneously. When several transactions execute at the same time, their operations may interfere with each other, potentially causing incorrect or inconsistent results.
Concurrency control is the mechanism used by a DBMS to manage simultaneous execution of transactions so that the database remains consistent and correct.
Its main goal is to ensure the isolation property of transactions and to guarantee serializability of schedules.
1. What is Concurrency Control?
Concurrency control refers to:
The techniques used by a DBMS to coordinate the execution of multiple transactions so that they do not interfere with each other and the database remains consistent.
When transactions run concurrently, their operations may be interleaved, forming different schedules. Concurrency control ensures that these schedules are equivalent to some serial schedule, meaning they produce correct results.
2. Why Concurrency Control is Needed
Without concurrency control, several problems may occur.
1. Lost Update Problem
Occurs when two transactions update the same data item and one update overwrites the other.
Example:
Final value becomes 120, but correct value should be 130.
2. Dirty Read Problem
Occurs when a transaction reads data written by another transaction that has not yet committed.
Example:
T2 used invalid data.
3. Unrepeatable Read
Occurs when a transaction reads the same data item twice and gets different values because another transaction modified it.
4. Incorrect Summary Problem
Occurs when a transaction calculates an aggregate (like SUM) while another transaction updates the data simultaneously.
3. Objectives of Concurrency Control
Concurrency control mechanisms aim to achieve the following goals:
1. Isolation
Each transaction should execute as if it is the only transaction running.
2. Serializability
Concurrent execution must produce results equivalent to a serial execution.
3. Database Consistency
The database must move from one consistent state to another.
4. Maximum Concurrency
The system should allow many transactions to execute simultaneously to improve performance.
4. Concurrency Control Protocols
A protocol is a set of rules that transactions must follow to ensure correct execution.
Most concurrency control protocols guarantee serializability of schedules.
The main categories of concurrency control techniques are:
-
Lock-Based Protocols
-
Timestamp-Based Protocols
-
Multiversion Concurrency Control
-
Validation (Optimistic) Protocols
-
Snapshot Isolation
5. Lock-Based Concurrency Control
One of the most common techniques used in DBMS is locking.
A lock is applied to a data item to control access by transactions.
Before accessing a data item, a transaction must obtain a lock on that item.
Types of Locks
| Lock Type | Purpose |
|---|---|
| Shared Lock (S) | Used for reading data |
| Exclusive Lock (X) | Used for writing data |
Two-Phase Locking (2PL)
A widely used locking protocol is Two-Phase Locking.
It has two phases:
-
Growing Phase – transaction acquires locks
-
Shrinking Phase – transaction releases locks
This protocol guarantees conflict serializability.
However, locking protocols may cause high overhead and deadlocks.
6. Timestamp-Based Concurrency Control
Another method uses timestamps.
A timestamp is a unique identifier assigned to each transaction when it starts.
Properties:
-
Generated in the order of transaction start times
-
Used to determine execution order
The system ensures that transactions execute in timestamp order, preventing conflicts.
Advantages:
-
Avoids deadlocks
-
Simple concept
Disadvantages:
-
May cause frequent rollbacks
7. Multiversion Concurrency Control (MVCC)
This technique maintains multiple versions of a data item.
Instead of overwriting data, the system creates a new version.
Transactions can read older versions while updates create new versions.
Advantages:
-
High concurrency
-
Readers do not block writers
Example protocols include:
-
Multiversion Timestamp Ordering
-
Multiversion Two-Phase Locking
Many modern DBMS systems use MVCC.
8. Optimistic Concurrency Control (Validation-Based)
This method assumes that conflicts are rare.
Transactions execute without locking resources initially.
At the end of execution, the system performs validation to check if conflicts occurred.
Phases:
-
Read Phase
-
Validation Phase
-
Write Phase
If validation fails, the transaction is rolled back.
This approach is called optimistic because it assumes conflicts will not occur frequently.
9. Snapshot Isolation
Snapshot isolation allows each transaction to work with a snapshot (copy) of the database at a specific time.
Characteristics:
-
Transactions see consistent database snapshots
-
Updates create new versions
Advantages:
-
High concurrency
-
Lower overhead compared to locking
Many commercial DBMSs (like PostgreSQL, Oracle) implement snapshot-based techniques.
10. Granularity of Data Items
Concurrency control also depends on granularity, which refers to the size of the data item being locked or controlled.
Granularity levels may include:
| Level | Example |
|---|---|
| Attribute level | Individual field |
| Record level | One tuple |
| Block level | Disk block |
| File level | Entire table |
| Database level | Whole database |
Smaller granularity → higher concurrency but more overhead.
Larger granularity → lower overhead but reduced concurrency.
11. Concurrency Control with Indexes
Indexes introduce additional challenges because:
-
Multiple transactions may access index structures
-
Updates to indexes must remain consistent with the database
Special concurrency control techniques are used to maintain index consistency.
12. Importance of Concurrency Control
Concurrency control is essential because it:
-
Maintains data consistency
-
Prevents transaction interference
-
Improves system performance
-
Ensures correct concurrent execution
-
Supports ACID properties of transactions
Summary
Concurrency control is a critical component of transaction management in DBMS. It ensures that multiple transactions can execute concurrently without violating database consistency.
Main techniques include:
| Technique | Main Idea |
|---|---|
| Lock-Based Protocols | Use locks to control access to data |
| Timestamp Ordering | Execute transactions based on timestamps |
| Multiversion Control | Maintain multiple versions of data |
| Validation Protocols | Validate transactions before committing |
| Snapshot Isolation | Transactions read consistent snapshots |
These techniques help maintain serializability, isolation, and correct database behavior during concurrent transaction execution.
Comments
Post a Comment