Concurrency Control
Concurrency
control in Database management
systems,other transactional objects, and related distributed
applications (e.g., Grid
computing and Cloud computing) ensures that database transactions are performed concurrently without violating the data integrity of the respective databases. Thus
concurrency control is an essential element for correctness in any system where
two database transactions or more, executed with time overlap, can access the
same data, e.g., virtually in any general-purpose database system. Consequently
a vast body of related research has been accumulated since database systems
emerged in the early 1970s. A well established concurrency control theory for database systems is outlined in
the references mentioned above: serializability
theory, which allows effectively designing and analyzing concurrency control
methods and mechanisms. An alternative theory for concurrency control of atomic
transactions over abstract data
types is presented in, and not
utilized below. This theory is more refined, complex, with a wider scope, and
has been less utilized in the Database literature than the classical theory
above. Each theory has its pros and cons, emphasis and insight. To some extent they are
complementary, and their merging may be useful.
To ensure
correctness, a DBMS usually guarantees that only serializable transaction schedules are generated, unless serializability is intentionally
relaxed to increase performance,
but only in cases where application correctness is not harmed. For maintaining
correctness in cases of failed (aborted) transactions (which can always happen
for many reasons) schedules also need to have the recoverability (from abort) property. A DBMS also
guarantees that no effect of committed transactions is lost, and no effect of
aborted (rolled back)
transactions remains in the related database. Overall transaction
characterization is usually summarized by the ACID rules below. As databases have become distributed, or needed to cooperate in
distributed environments (e.g., Federated
databases in the early 1990, and Cloud computing currently), the effective distribution
of concurrency control mechanisms has received special attention.
Database transaction and the ACID rules
The concept of a database transaction (or atomic transaction) has evolved in order to
enable both a well understood database system behavior in a faulty environment
where crashes can happen any time, and recovery from a crash to a well
understood database state. A database transaction is a unit of work, typically
encapsulating a number of operations over a database (e.g., reading a database
object, writing, acquiring lock, etc.), an abstraction supported in database
and also other systems. Each transaction has well defined boundaries in terms
of which program/code executions are included in that transaction (determined
by the transaction's programmer via special transaction commands). Every
database transaction obeys the following rules (by support in the database
system; i.e., a database system is designed to guarantee them for the
transactions it runs):
·
Atomicity - Either the effects of
all or none of its operations remain ("all or nothing" semantics)
when a transaction is completed (committed or aborted respectively). In other
words, to the outside world a committed transaction appears (by its effects on
the database) to be indivisible (atomic), and an aborted transaction does not
affect the database at all, as if never happened.
·
Consistency - Every transaction must
leave the database in a consistent (correct) state, i.e., maintain the
predetermined integrity rules of the database (constraints upon and among the
database's objects). A transaction must transform a database from one
consistent state to another consistent state (however, it is the responsibility
of the transaction's programmer to make sure that the transaction itself is
correct, i.e., performs correctly what it intends to perform (from the
application's point of view) while the predefined integrity rules are enforced
by the DBMS). Thus since a database can be normally changed only by
transactions, all the database's states are consistent.
·
Isolation - Transactions cannot
interfere with each other (as an end result of their executions). Moreover,
usually (depending on concurrency control method) the effects of an incomplete
transaction are not even visible to another transaction. Providing isolation is
the main goal of concurrency control.
·
Durability - Effects of successful
(committed) transactions must persist through crashes (typically by recording
the transaction's effects and its commit event in a non-volatile
memory).
The concept of atomic transaction has been
extended during the years to what has become a
Business transaction which actually implement types of Workflow and are not atomic.
However also such enhanced transactions typically utilize atomic transactions
as components.
Why is concurrency control needed?
If transactions are executed serially, i.e., sequentially with
no overlap in time, no transaction concurrency exists. However, if concurrent
transactions with interleaving operations are allowed in an uncontrolled
manner, some unexpected, undesirable result may occur, such as:
1. The lost update problem:
A second transaction writes a second value of a data-item (datum) on top of a
first value written by a first concurrent transaction, and the first value is
lost to other transactions running concurrently which need, by their
precedence, to read the first value. The transactions that have read the wrong
value end with incorrect results.
2. The dirty read problem:
Transactions read a value written by a transaction that has been later aborted.
This value disappears from the database upon abort, and should not have been
read by any transaction ("dirty read"). The reading transactions end
with incorrect results.
In a multiprogramming environment where more than one transactions can be concurrently executed, there exists a need of protocols to control the concurrency of transaction to ensure atomicity and isolation properties of transactions.
Concurrency control protocols, which ensure serializability of transactions, are most desirable. Concurrency control protocols can be broadly divided into two categories:
- Lock based protocols
- Time stamp based protocols
Lock based protocols
Database systems, which are equipped with lock-based protocols, use mechanism by which any transaction cannot read or write data until it acquires appropriate lock on it first. Locks are of two kinds:
- Binary Locks: a lock on data item can be in two states; it is either locked or unlocked.
- Shared/exclusive: this type of locking mechanism differentiates lock based on their uses. If a lock is acquired on a data item to perform a write operation, it is exclusive lock. Because allowing more than one transactions to write on same data item would lead the database into an inconsistent state. Read locks are shared because no data value is being changed.
There are four types lock protocols available:
- SimplisticSimplistic lock based protocols allow transaction to obtain lock on every object before 'write' operation is performed. As soon as 'write' has been done, transactions may unlock the data item.
- Pre-claimingIn this protocol, a transactions evaluations its operations and creates a list of data items on which it needs locks. Before starting the execution, transaction requests the system for all locks it needs beforehand. If all the locks are granted, the transaction executes and releases all the locks when all its operations are over. Else if all the locks are not granted, the transaction rolls back and waits until all locks are granted.
- Two Phase Locking - 2PLThis locking protocol is divides transaction execution phase into three parts. In the first part, when transaction starts executing, transaction seeks grant for locks it needs as it executes. Second part is where the transaction acquires all locks and no other lock is required. Transaction keeps executing its operation. As soon as the transaction releases its first lock, the third phase starts. In this phase a transaction cannot demand for any lock but only releases the acquired locks.
[Image: Two Phase Locking] Two phase locking has two phases, one is growing; where all locks are being acquired by transaction and second one is shrinking, where locks held by the transaction are being released.To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to exclusive lock. - Strict Two Phase LockingThe first phase of Strict-2PL is same as 2PL. After acquiring all locks in the first phase, transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release lock as soon as it is no more required, but it holds all locks until commit state arrives. Strict-2PL releases all locks at once at commit point.
[Image: Strict Two Phase Locking] Strict-2PL does not have cascading abort as 2PL does.
Time stamp based protocols
The most commonly used concurrency protocol is time-stamp based protocol. This protocol uses either system time or logical counter to be used as a time-stamp.
Lock based protocols manage the order between conflicting pairs among transaction at the time of execution whereas time-stamp based protocols start working as soon as transaction is created.
Every transaction has a time-stamp associated with it and the ordering is determined by the age of the transaction. A transaction created at 0002 clock time would be older than all other transaction, which come after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and priority may be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system know, when was last read and write operation made on the data item.
TIME-STAMP ORDERING PROTOCOL
The timestamp-ordering protocol ensures serializability among transaction in their conflicting read and write operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed according to the timestamp values of the transactions.
- Time-stamp of Transaction Ti is denoted as TS(Ti).
- Read time-stamp of data-item X is denoted by R-timestamp(X).
- Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows:
- If a transaction Ti issues read(X) operation:
- If TS(Ti) < W-timestamp(X)
- Operation rejected.
- If TS(Ti) >= W-timestamp(X)
- Operation executed.
- All data-item Timestamps updated.
- If a transaction Ti issues write(X) operation:
- If TS(Ti) < R-timestamp(X)
- Operation rejected.
- If TS(Ti) < W-timestamp(X)
- Operation rejected and Ti rolled back.
- Otherwise, operation executed.
THOMAS' WRITE RULE:
This rule states that in case of:
- If TS(Ti) < W-timestamp(X)
- Operation rejected and Ti rolled back. Timestamp ordering rules can be modified to make the schedule view serializable. Instead of making Ti rolled back, the 'write' operation itself is ignored.
Comments
Post a Comment