Serializability
From Wikipedia, the free encyclopedia
In databases and transaction processing, a schedule (transaction history) is serializable, has the Serializability property, if its outcome (the resulting database state, the values of the database's data) is equal to the outcome of its transactions executed sequentially without overlapping in time. Transactions are normally executed concurrently (they overlap), since this is the most efficient way. Serializability is the major correctness criterion for concurrent transactions' executions. It is considered the highest level of isolation between transactions, and plays an essential role in concurrency control.
Contents |
[edit] Correctness
[edit] Correctness - serializability
Serializability is the major criterion for the correctness of concurrent transactions' executions. As such it is supported in all general purpose database systems. The rationale behind it is the following:
- If each transaction is correct by itself, then any serial execution (no transaction overlap, but at any order) of these transactions is correct. As a result, any execution that is equivalent (in its outcome) to a serial execution, is correct.
Schedules that are not serializable are likely to generate erroneous outcomes. Well known examples are with transactions that debit and credit accounts with money. If the related schedules are not serializable, then the total sum of money may not be preserved. Money could disappear, or be generated from nowhere. This and violations of possibly needed other invariant preservations are caused by one transaction writing, and "stepping on" and erasing what has been written by another transaction before it has become permanent in the database. It does not happen if serializability is maintained.
Comment: If any specific order between some transactions is requested by an application, then it is enforced independently of the underlying serializability mechanisms. These mechanisms are typically indifferent to any specific order, and generate equivalence to some unpredictable serial order of these transactions. This order results from the scheduling ordres of concurrent transactions' data access operations, which depend on many factors.
[edit] Correctness - recoverability
In all real systems transactions can abort, and serializability by itself is not sufficient for correctness. Schedules also need to possess the recoverability property. Recoverability means that committed transactions have not read data written by aborted transactions (whose effects do not exist in the resulting database states). While serializability is currently compromised in many applications, compromising recoverability would quickly violate the database's integrity, as well as that of transactions' results.
[edit] Relaxing serializability
In many applications, unlike with finances, absolute correctness is not needed. For example, when retrieving a list of products according to specification, in most cases it does not matter much if a product, whose data was updated a short time ago, does not appear in the list, even if it meets the specification. It will typically appear in such a list when tried again a short time later. Commercial databases provide concurrency control with a whole range of isolation levels which are in fact (controlled) serializability violations in order to achieve higher performance. Higher performance means better transaction execution rate and shorter transaction response time (transaction duration).
Classes of schedules defined by relaxed serializability properties either contain the serializability class, or are incomparable with it.
[edit] View and conflict serializability
Two major types of serializability exist: view serializability, and conflict serializability. Any schedule that is conflict-serializable is also view-serializable. Conflict serializability is widely utilized because it is easier to determine. Determining view-serializability of a schedule is an NP-complete problem.
View serializability of a schedule is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that respective transactions in the two schedules read and write the same data values ("view" the same data values).
Conflict serializability is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that both schedules have the same sets of respective chronologically-ordered pairs of conflicting operations (same precedence relations of respective conflicting operations).
Operations upon data are read or write (insert or modify or delete). Two operations are conflicting, if they are of different transactions, upon the same datum, and at least one of them is write. The transaction of the second operation in the pair is said to be in conflict with the transaction of the first operation. A more general definition of conflicting operations (also for complex operations, which may consist each of several "simple" read/write operations) requires that they are noncommutative (changing their order also changes their combined result). Each such operation needs to be atomic by itself (by proper system support) in order to be commutative (nonconflicting) with the other. For example, the operations increment and decrement of a counter are both write operations, but do not need to be considered conflicting since they are commutative.
[edit] Enforcing conflict serializability
[edit] Testing conflict serializability
Schedule compliance with conflict serializability can be tested with the precedence graph (serializability graph, conflict graph) of the schedule for committed transactions. It is the directed graph representing precedence of transactions in the schedule, as reflected by precedence of conflicting operations in the transactions. In the conflict graph transactions are nodes and precedence relations are directed edges. There exists an edge from a first transaction to a second transaction if the second transaction is in conflict with the first (see Conflict serializability above). This graph, when only committed transactions are considered, is acyclic, if and only if the schedule is conflict-serializable. This means that when committed transactions create a cycle, conflict-serializability is violated.
Cycles of committed transactions can be prevented by aborting an undecided (neither committed, nor aborted) transaction on each cycle in the precedence graph of all the transactions, that can otherwise turn into a cycle of committed transactions (a committed transaction cannot be aborted). One transaction aborted per cycle is both required and sufficient number. The probability of cycle generation is typically low, but nevertheless, such a situation is carefully handled, since correctness is involved. Transactions aborted due to serializability violation prevention are executed again immediately.
Serializability enforcing mechanisms typically do not maintain a precedence graph as a data structure, but rather prevent or break cycles implicitly (e.g., SS2PL below).
[edit] Common mechanism - SS2PL
Strong strict two phase locking (SS2PL) is a common mechanism utilized in database systems to enforce both conflict serializability and strictness (a special case of recoverability) of a schedule. SS2PL is the name of the resulting schedule property as well, which is also called rigorousness. In this mechanism each datum is locked by a transaction before accessing it (any read or write operation): The item is marked by, associated with a lock of a certain type, depending on operation (and the specific implementation; various models with different lock types exist; in some models locks may change type during the transaction's life). As a result access by another transaction may be blocked, typically upon conflict, depending on lock type and the other transaction's access operation type. All locked data on behalf of a transaction are released only after the transaction has ended (either committed or aborted).
Mutual blocking between transactions results in a deadlock, where execution of these transactions is stalled, and no completion can be reached. A deadlock is a reflection of a potential cycle in the conflict graph, that would occur without the blocking. Deadlocks are resolved by aborting a transaction involved with such potential cycle. It is often detected using a wait-for graph, that indicates which transaction is "waiting for" lock release by which transaction, and a cycle means a deadlock. Aborting one transaction per cycle is sufficient to break the cycle. Transactions aborted due to deadlock resolution are executed again immediately.
[edit] Other enforcing techniques
Other known mechanisms include:
- Precedence graph cycle elimination
- Two-phase locking (2PL)
- Timestamp ordering (TO)
- (Local) commitment ordering (CO)
[edit] Global serializability and commitment ordering
In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple (and possibly distributed) databases. Enforcing global serializability in such heterogeneous system, where databases may use different types of concurrency control, is problematic. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability would lead to unacceptable performance, primarily due to computer and communication latency. The problem of achieving global serializability effectively in such systems had been characterized as open until the end of 1991 (see Global serializability).
An effective way to enforce conflict serializability globally is to enforce the commitment ordering (CO, or commit-order-serializability) property of each local schedule. CO is a broad special case of conflict serializability, and enforcing it locally in each schedule also enforces it, and hence serializability, globally. The only needed communication between the databases for this purpose is that of the atomic commitment protocol (such as two-phase commit protocol), that exists in most distributed environments and already must be utilized for each distributed transaction. Thus CO incurs no communication overhead. In each single database, local CO algorithm can run beside any local concurrency control mechanism (serializability enforcing mechanism) without interfering with its resource access scheduling strategy. As such CO provides a general, high performance, fully distributed solution. No central processing component or central data structure is needed. Moreover, CO works also in heterogeneous environments with different database system types and other multiple transactional objects that employ different serializability mechanisms. The CO solution scales up in network size and the number of databases without any negative impact on performance (assuming the statistics of a single distributed transaction, e.g., the average number of databases involved with a single transaction, are unchanged). This makes CO instrumental for global concurrency control.
CO implementation by itself is not sufficient as a concurrency control mechanism, since it lacks the important recoverability property.
SS2PL implies CO, and any SS2PL-compliant database can participate in multidatabase systems that utilize the CO solution for global serializability without any modification or addition of a CO algorithm component. As a matter of fact global serializability has been achieved in all-SS2PL multidatabase environments with the two phase commit (2PC) protocol since the eighties, and SS2PL in conjunction with 2PC is the de facto standard to reach global serializability across (the common, SS2PL-compliant) databases.
Having the CO property means that the precedence (partial) order of transactions' commitment events is identical to the precedence (partial) order of the respective transactions as determined by their local acyclic conflict graph. Any conflict serializable schedule can be made a CO compliant one, without aborting any transaction in the schedule, by delaying commitment events to comply with the needed partial order.
The commitment event of a distributed transaction is always generated by some atomic commitment protocol, utilized to reach consensus among its processes on whether to commit or abort it. This procedure is always carried out for distributed transactions, independently of CO. The atomic commitment protocol plays a central role in the distributed CO algorithm. In case of incompatible local commitment orders in two or more databases (no global partial order can embed the local partial orders), which implies a global cycle (a cycle that spans two or more database) in the global conflict graph, CO generates a voting-deadlock for the atomic commitment protocol, and the protocol resolves that deadlock by aborting a transaction on the cycle and breaking the cycle. With CO, the global augmented conflict graph (where being blocked by a lock to prevent conflict is also considered conflict) becomes a wait-for graph for voting, and a global cycle means a voting-deadlock. Thus also global deadlocks due to locking are resolved automatically by the same mechanism. No implementation of such a global graph is needed, and it is used only to explain the behavior of CO.
[edit] References
- Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman: Concurrency Control and Recovery in Database Systems, Addison Wesley Publishing Company, 1987, ISBN 0-20110-715-5
- Gerhard Weikum, Gottfried Vossen: Transactional Information Systems, Elsevier, 2001, ISBN 1-55860-508-8
- Yoav Raz: The Principle of Commitment Ordering, or Guaranteeing Serializability in a Heterogeneous Environment of Multiple Autonomous Resource Managers Using Atomic Commitment. Proceedings of the Eighteenth International Conference on Very Large Data Bases (VLDB), pp. 292-312, Vancouver, Canada, August 1992, ISBN 1-55860-151-1 (also DEC-TR 841, Digital Equipment Corporation, November 1990)

