lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (9599B)


      1 +++
      2 title = "Transactions"
      3 +++
      4 
      5 # Transactions
      6 
      7 transaction: a sequence of actions we want to perform on a database
      8 
      9 should be atomic: either run fully, or not at all. this avoids
     10 concurrency problems like:
     11 
     12 -   losing effects of one transaction due to uncontrolled overwrite by
     13     another (\'lost update anomaly\')
     14 -   transaction reads partial result of another transaction
     15     (\'inconsistent read anomaly\')
     16 -   transaction reads changes made by another transaction before they
     17     are rolled back (\'dirty read anomaly\')
     18 -   transaction reads value which is afterwards changed by another
     19     transaction (\'unrepeatable read anomaly\')
     20 
     21 For this, we need to get some ACID:
     22 
     23 -   Atomicity: transaction executes fully or not at all (commit or
     24     abort)
     25 -   Consistency: transactions always leave database in consistent state,
     26     where all defined integrity constraints hold
     27 -   Isolation: multiple users can modify database at the same time
     28     without seeing partial actions
     29 -   Durability: when transaction is committed successfully, the data is
     30     persistent, regardless of crashes
     31 
     32 transaction: a list of actions
     33 
     34 -   reads - R(O)
     35 -   writes - W(O)
     36 -   end with Commit or Abort
     37 -   e.g.: T₁: R(V)\< R(Y), W(V), W(C), Commit
     38 
     39 ## Schedules
     40 
     41 scheduler decides execution order of concurrent database access
     42 
     43 schedule is list of actions from set of transactions (\'plan on how to
     44 execute transactions\'). order in which 2 actions of transaction T
     45 appear in schedule must be same as order in T.
     46 
     47 ## Serializability
     48 
     49 serial schedule: if actions of different transactions are executed one
     50 after another (e.g. all of T2, then all of T1)
     51 
     52 serializable schedule: if its effect on database is same as that of some
     53 serial schedule
     54 
     55 actions in schedule conflict if they
     56 
     57 -   are from different transactions
     58 -   and involve same data item
     59 -   and one action is write
     60 
     61 conflicts may cause schedule to not be serializable
     62 
     63 conflict types:
     64 
     65 -   write read (WR) - T₁ writes Y, then T₂ reads Y
     66 -   read write (RW) - T₁ reads Y, then T₂ writes Y
     67 -   write write (WW) - T₁ writes Y, then T₂ writes Y
     68 
     69 we can swap actions of different transactions if actions are
     70 non-conflicting.
     71 
     72 conflict equivalent schedules: if they can be transformed into each
     73 other by swapping non-conflicting, adjacent transactions.
     74 
     75 conflict-serializable: if conflict equivalent to some serial schedule
     76 
     77 check it with a precedence graph:
     78 
     79 -   graph has node for each transaction
     80 -   edge from T₁ to T₂ if conflicting action between T₁ and T₂ (with T₁
     81     first)
     82 -   conflict-serializable iff no cycle in the graph
     83 -   if no cycles, serial schedule is a topological sort of precedence
     84     graph
     85 
     86 ## Runtime serializability strategies
     87 
     88 serializability during runtime: system doesn\'t know which transactions
     89 will run, and which items they\'ll access
     90 
     91 strategies:
     92 
     93 -   Pessimistic: lock-based, timestamp based
     94 -   Optimistic: read-set/write-set tracking, validation before commit
     95 -   Multi-version techniques: eliminate concurrency control overhead for
     96     read-only queries
     97 
     98 ### Pessimistic: lock-based, two phase locking
     99 
    100 transactions must lock objects before using them
    101 
    102 types:
    103 
    104 -   shared lock (S-lock): acquired on Y before *reading* Y, many
    105     transactions can hold a shared lock on Y
    106 -   exclusive lock (X-lock): acquired on Y before *writing* Y.
    107     transaction can hold exclusive lock on Y if no other transaction
    108     holds a lock on Y.
    109 
    110 2 phase locking protocol:
    111 
    112 -   each transaction must get:
    113     -   S-lock on object before reading it
    114     -   X-lock on object before writing it
    115 -   transaction can\'t get new locks once it releases any lock
    116 -   any schedule that conforms to 2PL is conflict-serializable
    117 
    118 #### Deadlock handling
    119 
    120 2 PL has the risk of deadlocks where both transactions wait for each
    121 other indefinitely. need to detect deadlock.
    122 
    123 detection with Wait-for-Graphs
    124 
    125 -   system maintains wait-for-graph, where nodes are transactions and
    126     edges A→B mean A is waiting for B to release lock
    127 -   system periodically checks for graph cycles
    128 -   if cycle detected, you abort a transaction
    129 -   selecting the victim is a challenge:
    130     -   if you abort a young one, there will be starvation
    131     -   if you abort an old one, you\'ll be throwing away what you
    132         invested in it
    133     -   phrasing, dude.
    134 
    135 detection with timeout
    136 
    137 -   let transactions block on a lock request for a limited time
    138 -   after timeout, assume deadlock and abort T
    139 
    140 #### Cascading rollbacks
    141 
    142 cascadeless schedule
    143 
    144 -   delay reads, only read value produced by already committed
    145     transactions
    146 -   so if a value is required, wait for the commit
    147 -   no dirty reads, so abort doesn\'t cascade
    148 
    149 recoverable schedule:
    150 
    151 -   delay commit - if T2 reads value written by T1, commit of T2 has to
    152     wait until after commit of T1
    153 
    154 schedules should always be recoverable. all cascadeless schedules are
    155 recoverable.
    156 
    157 #### Strict 2 phase locking
    158 
    159 same as 2PL, but a transaction releases all locks only when it\'s
    160 completed (commit/rollback). it\'s cascadeless, but still has deadlocks.
    161 
    162 #### Preclaiming 2 phase locking
    163 
    164 all needed locks are declared at start of transaction. therefore, no
    165 deadlocks. however, not applicable in multi-query transactions (where
    166 queries might depend on results of previous queries)
    167 
    168 #### Granularity of locking
    169 
    170 there\'s a tradeoff. the more specific your locking is (database, vs
    171 table, vs row level), the higher concurrency you have, and the higher
    172 overhead
    173 
    174 multi-granularity locking - decide granularity of locks held for each
    175 transaction depending on characteristics of transaction
    176 
    177 intention locks (do not conflict with each other):
    178 
    179 -   intention share (IS)
    180 -   intention exclusive (IX)
    181 
    182 an intention lock on coarser level of granularity means there is S/X
    183 lock on finer level of granularity.
    184 
    185 before a granule *g can* be locked in S/X mode, the transaction has to
    186 obtain an IS/IX lock on all coarser granularities containing *g*
    187 
    188 after all intention locks are granted, transaction can lock *g* in the
    189 announced mode
    190 
    191 levels of granularity: database → table → row
    192 
    193 #### Optimising performance
    194 
    195 for each query in log:
    196 
    197 -   analyse average time and variance for this type of query
    198     -   if long delays or frequently aborts, might be contention
    199 -   read only or updating query?
    200     -   compute read-sets, write-sets
    201     -   will it require row/table locks? shared/exclusive?
    202 
    203 How do read- and write-sets of queries intersect? What is chance of
    204 conflicts?
    205 
    206 When you understand the query workload, you can:
    207 
    208 -   rewrite queries for smaller read- and write-sets
    209 -   change scheduling of queries to reduce contention
    210 -   use different isolation level for queries
    211 
    212 #### Isolation levels
    213 
    214 some degree of inconsistency may be acceptable to get increased
    215 concurrency & performance
    216 
    217 SQL-92 levels:
    218 
    219 -   `read uncommitted`: only write locks acquired, any row read can be
    220     concurrently changed by other transactions
    221 -   `read committed`: read locks held for as long as application cursor
    222     sits on a current row, write locks as usual
    223 -   `repeatable read`: strict 2PL, a transaction may read phantom rows
    224     if it runs an aggregation query twice
    225 -   `serializable`: strict 2PL, multi-granularity locking. no phantom
    226     rows.
    227 
    228 ![SQL-92 isolation levels](sql-92-isolation-levels.png)
    229 
    230 phantom row problem: T1 locks all rows, but T2 inserts new row that
    231 isn\'t locked.
    232 
    233 solutions:
    234 
    235 -   multi-granularity locking (locking the table)
    236 -   declarative locking - key-range or predicate locking
    237 
    238 many applications don\'t need full serializability, selecting a weaker
    239 but acceptable isolation level is part of database tuning.
    240 
    241 ### Optimistic concurrency control
    242 
    243 hope for the best, only check that no conflicts happened when
    244 committing. this saves locking overhead.
    245 
    246 three phases:
    247 
    248 1.  Read phase: execute transaction, but don\'t write data to disk.
    249     collect updates in transaction\'s private workspace
    250 2.  Validation phase: when transaction wants to commit, DBMS test
    251     whether execution correct, and abort if needed.
    252 3.  Write phase: transfer data from private workspace into database.
    253 
    254 Phases 2, 3 have to be in non-interruptible critical section (val-write
    255 phase).
    256 
    257 #### Validation
    258 
    259 typically implemented by maintaining:
    260 
    261 -   read set RS(Tk) - attributes read by Tk
    262 -   write set Ws(Tk) - attributes written by Tk
    263 
    264 Backward-oriented optimistic concurrency control (BOCC)
    265 
    266 -   on commit, compare Tk against all *committed* transactions Ti
    267 -   succeeds if
    268     -   Ti committed before Tk started
    269     -   or Rs(Tk) ∩ WS(Ti) = Ø
    270 
    271 Forward-oriented optimistic concurrency control (FOCC)
    272 
    273 -   on commit, compare Tk against all *running* transactions Ti
    274 -   succeeds if
    275     -   Ws(Tk) ∩ RS(Ti) = Ø
    276 
    277 #### Multiversion concurrency control
    278 
    279 with old object versions around, read-only transactions never need to be
    280 blocked
    281 
    282 -   might see outdated but consistent version of data
    283 -   like everything in query happened the moment it started
    284 
    285 issues:
    286 
    287 -   versioning requires space and management overhead
    288 -   update transactions still need concurrency control
    289 
    290 snapshot isolation:
    291 
    292 -   each transaction sees consistent snapshot of database corresponding
    293     to state at moment it started
    294 -   read-only transactions don\'t have to lock anything
    295 -   transactions conflict if write to same object
    296     -   pessimistic concurrency control - only writes are locked
    297     -   optimistic concurrency control - only write-sets interesting
    298 -   does not guarantee serializability
    299 -   avoids dirty read, unrepeatable read, phantom rows
    300 -   introduces write skew, with complex assertions involving multiple
    301     tuples