Serializability
serializability
It Identifies data transactions as occurring serially, independent of one another, even though they may have occurred
concurrently. A schedule or list of transactions is deemed to be correct if they are serialized, otherwise, they
may contain errors that can lead to duplication or overlap.
Serializability is a property of a
transaction schedule (history). It
relates to the isolation property of a database transaction.
Serializability of a schedule means
equivalence (in the outcome, the database state, data values) to a serial
schedule (i.e., sequential with no transaction overlap in time) with
the same transactions. It is the major criterion for the correctness of
concurrent transactions' schedule, and thus supported in all general purpose
database systems.
Correctness – serializability
The
rationale behind serializability is the following:
If
each transaction is correct by itself, i.e., meets certain integrity
conditions, then a schedule that comprises any serial execution
of these transactions is correct (its transactions still meet their
conditions): "Serial" means that transactions do not overlap in time
and cannot interfere with each other, i.e, complete isolation between
each other exists. Any order of the transactions is legitimate, if no
dependencies among them exist, which is assumed (see comment below). As a
result, a schedule that comprises any execution (not necessarily serial) that
is equivalent (in its outcome) to any serial execution of these transactions,
is correct.
Schedules that are not
serializable are likely to generate erroneous outcomes. Well known examples are
with transactions that debit and credit accounts with money: If the related
schedules are not serializable, then the total sum of money may not be
preserved. Money could disappear, or be generated from nowhere. This and
violations of possibly needed other invariant preservations are caused by one
transaction writing, and "stepping on" and erasing what has been
written by another transaction before it has become permanent in the database.
It does not happen if serializability is maintained.
If any specific order
between some transactions is requested by an application, then it is enforced
independently of the underlying serializability mechanisms. These mechanisms
are typically indifferent to any specific order, and generate some
unpredictable partial
order that
is typically compatible with multiple serial orders of these transactions. This
partial order results from the scheduling orders of concurrent transactions'
data access operations, which depend on many factors.
A major characteristic of
a database transaction is atomicity, which means that it either commits,
i.e., all its operations' results take effect in the database, or aborts (rolled-back),
all its operations' results do not have any effect on the database ("all
or nothing" semantics of a transaction). In all real systems transactions
can abort for many reasons, and serializability by itself is not sufficient for
correctness. Schedules also need to possess the recoverability (from abort)
property. Recoverability means that committed transactions
have not read data written by aborted transactions (whose effects do not exist
in the resulting database states). While serializability is currently
compromised on purpose in many applications for better performance (only in
cases when application's correctness is not harmed), compromising
recoverability would quickly violate the database's integrity, as well as that
of transactions' results external to the database. A schedule with the
recoverability property (a recoverable schedule)
"recovers" from aborts by itself, i.e., aborts do not harm the
integrity of its committed transactions and resulting database. This is false
without recoverability, where the likely integrity violations (resulting
incorrect database data) need special, typically manual, corrective actions in
the database.
Implementing
recoverability in its general form may result in cascading aborts:
Aborting one transaction may result in a need to abort a second transaction,
and then a third, and so on. This results in a waste of already partially
executed transactions, and may result also in a performance penalty. Avoiding cascading aborts (ACA, or
Cascadelessness) is a special case of recoverability that exactly prevents such
phenomenon. Often in practice a special case of ACA is utilized: Strictness. Strictness allows an
efficient database recovery from failure.
Note that the recoverability property
is needed even if no database failure occurs and no database recovery from
failure is needed. It is rather needed to correctly automatically handle
aborts, which may be unrelated to database failure and recovery from failure.
Relaxing serializability
In many
applications, unlike with finances, absolute correctness is not needed. For
example, when retrieving a list of products according to specification, in most
cases it does not matter much if a product, whose data was updated a short time
ago, does not appear in the list, even if it meets the specification. It will
typically appear in such a list when tried again a short time later. Commercial
databases provide concurrency control with a whole range of isolation levels which are in fact (controlled) serializability
violations in order to achieve higher performance. Higher performance means
better transaction execution rate and shorter average transaction response time
(transaction duration). Snapshot
isolation is an example of a
popular, widely utilized efficient relaxed serializability method with many
characteristics of full serializability, but still short of some, and unfit in
many situations.
Another
common reason nowadays for distributed
serializability relaxation (see
below) is the requirement of availability of internet products and services. This requirement is
typically answered by large-scale data replication.
The straightforward solution for synchronizing replicas' updates of a same
database object is including all these updates in a single atomic distributed transaction. However, with
many replicas such a transaction is very large, and may span several computers and networks that some of them are likely to be
unavailable. Thus such a transaction is likely to end with abort and miss its
purpose. Consequently Optimistic
replication (Lazy replication) is
often utilized (e.g., in many products and services by Google, Amazon, Yahoo, and alike), while
serializability is relaxed and compromised for eventual consistency. Again in this
case, relaxation is done only for applications that are not expected to be
harmed by this technique.
Classes
of schedules defined by relaxed
serializability properties
either contain the serializability class, or are incomparable with it.
View and conflict Serializablity
Mechanisms that enforce
serializability need to execute in real time, or almost in real time, while
transactions are running at high rates. In order to meet this requirement
special cases of serializability, sufficient conditions for serializability
which can be enforced effectively, are utilized.
Two major types of
serializability exist: view-serializability, and conflict-serializability.
View-serializability matches the general definition of serializability given
above. Conflict-serializability is a broad special case, i.e., any schedule
that is conflict-serializable is also view-serializable, but not necessarily
the opposite. Conflict-serializability is widely utilized because it is easier
to determine and covers a substantial portion of the view-serializable
schedules. Determining view-serializability of a schedule is an NP-complete problem (a class of
problems with only difficult-to-compute, excessively time-consuming known
solutions).
View-serializability of a schedule is
defined by equivalence to a serial schedule (no overlapping transactions) with
the same transactions, such that respective transactions in the two schedules
read and write the same data values ("view" the same data values).
Conflict-serializability is defined by
equivalence to a serial schedule (no overlapping transactions) with the same
transactions, such that both schedules have the same sets of respective
chronologically ordered pairs of conflicting operations (same precedence
relations of respective conflicting operations).
Operations upon data are read or write (a
write: either insert or modify or delete).
Two operations are conflicting, if they are of different
transactions, upon the same datum (data item), and at least one of them is write.
Each such pair of conflicting operations has a conflict type: It is
either a read-write, or write-read, or a write-write conflict.
The transaction of the second operation in the pair is said to be in
conflict with the transaction of the first operation. A more general
definition of conflicting operations (also for complex operations, which may
consist each of several "simple" read/write operations) requires that
they are noncommutative (changing their
order also changes their combined result). Each such operation needs to be
atomic by itself (by proper system support) in order to be considered an
operation for a commutativity check. For example, read-read operations are commutative
(unlike read-write and the other possibilities) and thus read-read is not a
conflict. Another more complex example: the operations increment anddecrement of
a counter are both write operations (both
modify the counter), but do not need to be considered conflicting (write-write
conflict type) since they are commutative (thus increment-decrement is not a
conflict; e.g., already has been supported in the old IBM's IMS "fast
path").
Only precedence (time order) in pairs of conflicting (non-commutative)
operations is important when checking equivalence to a serial schedule, since
different schedules consisting of the same transactions can be transformed from
one to another by changing orders between different transactions' operations
(different transactions' interleaving), and since changing orders of
commutative operations (non-conflicting) does not change an overall operation
sequence result, i.e., a schedule outcome (the outcome is preserved through
order change between non-conflicting operations, but typically not when
conflicting operations change order). This means that if a schedule can be
transformed to any serial schedule without changing orders of conflicting
operations (but changing orders of non-conflicting, while preserving operation
order inside each transaction), then the outcome of both schedules is the same,
and the schedule is conflict-serializable by definition.
Conflicts are the reason
for blocking transactions and delays (non-materialized conflicts), or for
aborting transactions due to serializability violations prevention. Both
possibilities reduce performance. Thus reducing the number of conflicts, e.g.,
by commutativity (when possible), is a way to increase performance.
Comments
Post a Comment