Recovery from System Failures

System failures (also called soft crashes) are those failures like power outage which affect all transactions in progress, but do not physically damage the database.

During a system failure, the contents of the main memory are lost. Thus the contents of the database buffers which contain the updates of transactions are lost. (Note: Transactions do not directly write on to the database. The updates are written to database buffers and, at regular intervals, transferred to the database.) At restart, the system has to ensure that the ACID properties of transactions are maintained and the database remains in a consistent state. To attain this, the strategy to be followed for recovery at restart is as follows:

1.Transactions which were in progress at the time of failure have to be undone at the time of restart. This is needed because the precise state of such a transaction which was active at the time of failure is no longer known and hence cannot be successfully completed.

2.Transactions which had completed prior to the crash but could not get all their updates transferred from the database buffers to the physical database have to redone at the time of restart.

This recovery procedure is carried out with the help of:

1.An online logfile or journal – The logfile maintains the before- and after-images of the tuples updated during a transaction. This helps in carrying out the UNDO and REDO operations as required.
2.Typical entries made in the logfile are :
3.Start of Transaction Marker
4.Transaction Identifier
5.Record Identifier
6.Operations Performed
7.Previous Values of Modified Data (Before-image or Undo Log)
8.Updated Values of Modified Records (After-image or Redo Log)
9.Commit / Rollback Transaction Marker

10.Taking a checkpoint at specific intervals – This involves the following two operations:
a) physically writing the contents of the database buffers out to the physical database. Thus during a checkpoint the updates of all transactions, including both active and committed transactions, will be written to the physical database.
b)physically writing a special checkpoint record to the physical log. The checkpoint record has a list of all active transactions at the time of taking the checkpoint.



At the time of restart, T3 and T5 must be undone and T2 and T4 must be redone.

T1 does not enter the recovery procedure at all since it updates were all written to the database at time tc as part of the checkpoint proces




Recovery and Concurrency

Recovery and Concurrency in a DBMS are part of the general topic of transaction management. Hence we shall begin the discussion by examining the fundamental notion of a transaction.

Transaction

A transaction is a logical unit of work.

Consider the following example:

The procedure for transferring an amount of Rs. 100/- from the account of one customer to another is given.
EXEC SQL WHENEVER SQLERROR GOTO
UNDO
EXEC SQL UPDATE DEPOSIT
SET BALANCE=BALANCE-100
WHERE CUSTID=from_cust;
EXEC SQL UPDATE DEPOSIT
SET BALANCE=BALANCE+100
WHERE CUSTID=to_cust:
EXEC SQL COMMIT;
GOTO FINISH
UNDO:
EXEC SQL ROLLBACK;
FINISH:
RETURN;

Here, it has to be noted that the single operation “amount transfer” involves two database updates – updating the record of from_cust and updating the record of to_cust. In between these two updates the database is in an inconsistent (or incorrect in this example) state. i.e., if only one of the updates is performed, one cannot say by seeing the database contents whether the amount transfer operation has been done or not. Hence to guarantee database consistency it has to be ensured that either both updates are performed or none are performed. If, after one update and before the next update, something goes wrong due to problems like a system crash, an overflow error, or a violation of an integrity constraint etc., then the first update needs to be undone.

This is true with all transactions. Any transaction takes the database from one consistent state to another. It need not necessarily preserve consistency of database at all intermediate points. Hence it is important to ensure that either a transaction executes in its entirety or is totally cancelled. The set of programs which handles this forms the transaction manager in the DBMS. The transaction manager uses COMMIT and ROLLBACK operations for ensuring atomicity of transactions.

COMMIT – The COMMIT operation indicates successful completion of a transaction which means that the database is in a consistent state and all updates made by the transaction can now be made permanent. If a transaction successfully commits, then the system will guarantee that its updates will be permanently installed in the database even if the system crashes immediately after the COMMIT.

ROLLBACK – The ROLLBACK operation indicates that the transaction has been unsuccessful which means that all updates done by the transaction till then need to be undone to bring the database back to a consistent state. To help undoing the updates once done, a system log or journal is maintained by the transaction manager. The before- and after-images of the updated tuples are recorded in the log.

The properties of transaction can be summarised as ACID properties - ACID standing for atomicity, consistency, isolation and durability.

Atomicity:
A transaction is atomic. Either all operations in the transaction have to be performed or none should be performed.

Consistency:Transactions preserve database consistency. i.e., A transaction transforms a consistent state of the database into another without necessarily preserving consistency at all intermediate points.

Isolation: Transactions are isolated from one another. i.e., A transaction's updates are concealed from all others until it commits (or rolls back).

Durability: Once a transaction commits, its updates survive in the database even if there is a subsequent system crash.