index.md - lectures.alex.balgavy.eu - Lecture notes from university.

index.md (10771B)
      1 +++
      2 title = "Relational normal forms"
      3 template = "page-math.html"
      4 +++
      5 
      6 # Relational normal forms
      7 
      8 ## Functional dependencies
      9 
     10 Functional dependencies are a generalization of keys. This theory
     11 defines when a relation is in normal form.
     12 
     13 functional dependency: if two rows agree on a value in one column, they
     14 must also agree on the other column e.g. here, functional dependency is
     15 `INAME → PHONE`, because phone number only depends on the instructor
     16 intuitively:
     17 
     18 ![FD table example](fd-table-example.png)
     19 
     20 This is read as \"`INAME` (functionally, uniquely) determines `PHONE`\".
     21 
     22 ### What is a functional dependency?
     23 
     24 -   like partial key, because it uniquely determines some attributes but
     25     not all
     26 -   a constraint
     27 -   A determinant is a \'minimal\' functional dependency.
     28 -   goal of database normalization is to turn functional dependencies
     29     into keys
     30 
     31 Keys are functional dependencies.
     32 
     33 During database design, only unquestionable conditions should be used as
     34 functional dependencies.
     35 
     36 It\'s usually bad database design if schema\'s relations violate normal
     37 form. If it\'s violated, data is stored redundantly and information
     38 about different concepts is intermixed.
     39 
     40 ### Working with functional dependencies
     41 
     42 The database designer is not interested in all functional dependencies,
     43 but only in a representative functional dependency set that implies all
     44 others.
     45 
     46 Implications of functional dependencies:
     47 
     48 -   If A→B and B→C, then A→C
     49 -   A→A always holds
     50 -   Armstrong axioms:
     51     -   If β ⊆ α, then α → β (reflexivity)
     52     -   If α → β, then α ∪ γ → β ∪ γ (augmentation)
     53     -   If α → β and β → γ, then α → γ (transitivity)
     54 
     55 Computing the cover: for given set of attributes, see which they imply
     56 through FDs. Extended cover with those, and repeat.
     57 
     58 -   Cover of A ({A}+)? FDs A→B,C; B→E
     59     -   {A}+ ⇒ {A,B,C} ⇒ {A,B,C,E}
     60 
     61 Checking whether a → β is implied by a functional dependency set:
     62 
     63 1.  computer cover α⁺ of α:
     64 2.  check if β ⊆ α⁺:
     65     -   set of functional dependencies F implies α → β iff β ⊆ α⁺\_F
     66 
     67 example:
     68 
     69 ![Example to check functional dependency](example-to-check-fd.png)
     70 
     71 ### How to determine keys
     72 
     73 Determining a minimal key (slides 64-70)
     74 
     75 -   given:
     76     -   R(A,B,C,D)
     77     -   FDs: A→C; C→B,D
     78 -   start with all attributes: {A,B,C,D}
     79 -   For every attribute, see if we can remove it, which is possible if
     80     they are implied by an FD which is still in the attribute set.
     81     -   {A,B,C,D} ⇒ {A,C} (FD2) ⇒ {A} (FD1)
     82     -   thus {A} is a minimal key
     83     -   order matters, you can end up with different keys!
     84 
     85 determining all minimal keys (slides 71-88)
     86 
     87 -   given:
     88     -   R(A,B,C,D)
     89     -   FDs: A→C; C→B,D
     90 -   Start with set of candidates: attributes that imply and are not
     91     implied (not in any right hand side)
     92     -   { {A} }
     93 -   Find cover of smallest candidate key
     94     -   {A} ⇒ {A,C} ⇒ {A,B,C,D}
     95 -   If does not contain all attributes, extend candidate with missing
     96     others and repeat.
     97 -   edge case: if there are no attributes that aren\'t in any right hand
     98     side, then candidates is empty set { {} } and you extend with every
     99     attribute like { {A}, {B}, \...}
    100 
    101 ### Determinants
    102 
    103 determinant: non-trivial, minimal functional dependency
    104 
    105 {A1, \..., An} is determinant for {B1, \..., Bm} if:
    106 
    107 -   functional dependency A1, \..., An → B1, \..., Bm holds; and
    108 -   left-hand side is minimal (if any Ai is removed, then it does not
    109     hold); and
    110 -   it is non-trivial, i.e. {B1, \..., Bm} not subset of {A1, \..., An}
    111 
    112 ### Consequences of bad database design
    113 
    114 usually if table contains an functional dependency that\'s not implied
    115 by a key, it\'s a sign of bad database design.
    116 
    117 leads to:
    118 
    119 -   redundant storage of certain facts
    120     -   wastes storage space
    121     -   hard to ensure integrity when updating, as all redundant copies
    122         need to be updated, wasting time
    123     -   requires additional constraints to guarantee integrity
    124 -   insert, update, deletion anomalies
    125     -   update: when a single value needs to be changed, multiple tuples
    126         need to be updated, taking longer and maybe getting out of sync
    127     -   insertion: when unrelated concepts are stored together in a
    128         single table
    129     -   deletion: e.g. when last course of instructor is deleted, their
    130         phone number is lost
    131 
    132 problem is that general FDs are not supported by relational databases.
    133 so you have to transform them into key constraints (database
    134 normalisation).
    135 
    136 ## Normal forms
    137 
    138 Normal form types:
    139 
    140 -   Third Normal Form (3NF): standard relational normal form used in
    141     practice
    142 -   Boyce-Codd Normal Form (BCNF):
    143     -   a bit more restrictive, easier to define, better for intuition
    144     -   BCNF requires that all functional dependencies are keys.
    145     -   ensures that key constraints automatically satisfy all FDs, so
    146         no more constraints are needed
    147     -   anomalies (update/insertion/deletion) don\'t occur
    148 
    149 Normalisation algorithms can construct good relation schemas from
    150 attributes and functional dependencies. When an ER model is well
    151 designed, resulting derived relational tables will automatically be in
    152 BCNF.
    153 
    154 First normal form (1NF):
    155 
    156 -   requires all table entries are atomic (not lists, sets, records, or
    157     relations)
    158 -   all further normal forms assume that tables are in 1NF
    159 
    160 ### Boyce-Codd Normal Form (BCNF)
    161 
    162 -   if all of its FDs are implied by its key constraints
    163 -   in symbols:
    164     -   for every FD A1,\...,An → B1,\...,Bm of R, we have
    165     -   either {B1,\...,Bm} ⊆ {A1,\...,An} (the FD is trivial)
    166     -   or {A1,\...,An} contains a key of R
    167 -   in short, if for every non-trivial functional dependency, left-hand
    168     side contains a key
    169 
    170 > ![Example of checking BCNF](example-of-checking-bcnf.png)
    171 
    172 #### Splitting relations
    173 
    174 If table R is not in BCNF, we can split it into two tables. You split
    175 based on violating FD.
    176 
    177 Table decomposition:
    178 
    179 -   if FD A1,\...,An → B1,\...,Bm violates BCNF:
    180     1.  create new relation S(A1,\...,An,B1,\...,Bm)
    181     2.  and remove B1,\...,Bm from original relation R
    182 
    183 Splitting has to be
    184 []{#Relational normal forms-Normal forms-Boyce-Codd Normal Form (BCNF)-Splitting relations-lossless}**lossless**
    185 so that you can reconstruct original relation by a join.
    186 
    187 Decomposition theorem: split is guaranteed to be lossless if
    188 intersection of attributes of new tables is a key of at least one of
    189 them.
    190 
    191 It\'s always possible to transform relation into BCNF by lossless
    192 splitting. The resulting schema can always represent all previously
    193 possible states, but it may be more general and allow states that do not
    194 exist in the old schema.
    195 
    196 With computable columns, splitting the relation is not the right
    197 solution - instead, define a view with the computed column.
    198 
    199 A good decomposition should guarantee preservation of FDs:
    200 
    201 -   and FD can refer only to attributes of a single relation
    202 -   when splitting relation into two, there might be FDs that can\'t be
    203     expressed anymore (not preserved)
    204 
    205 #### Synthesis
    206 
    207 Determining canonical (minimal) set of FDs:
    208 
    209 -   given:
    210     -   R(A,B,C,D,E)
    211     -   FDs: D→A; E→A,D; C,D→A; A,E→C; B→A,D,E
    212 -   Rewrite every FD as singular:
    213     -   D→A, E→A, E→D, C,D→A, A,E→C, B→A, B→D, B→E
    214 -   Minimise left hand side of every FD (aka is every FD minimal?)
    215     -   drop C from C,D→A because D→A
    216     -   drop A from A,E→C because E→A
    217     -   so: D→A, E→A, E→D, D→A, E→C, B→A, B→D, B→E
    218 -   Remove implied FDs (and trivial/duplicate) using lhs attributes (aka
    219     if we can determine rhs without the FD itself)
    220     -   D→A, E→D, E→C, B→E
    221 
    222 BCNF synthesis (relation R, set of FDs for R):
    223 
    224 1.  Determine canonical set of FDs
    225     -   e.g. D→A, E→D, E→C, B→E
    226 2.  Maximise rhs of FDs
    227     -   {D}+ - D = {A}
    228     -   \\({E}\_{-E}\^+\\) = {D,C,A}
    229     -   \\({B}\_{-B}\^+\\) = {E,D,C,A} ({B} is the minimal key in this
    230         case)
    231 3.  Split on violating FDs. For each FD, remove the rhs from relations
    232     and add a new relation, with the lhs of that FD being the key.
    233 
    234 ### Third normal form (3NF)
    235 
    236 -   relation is in 3NF if for every non-trivial functional dependency:
    237     -   left-hand side contains a key
    238     -   or right-hand side is attribute of minimal key
    239 
    240 retains all FDs, so more popular than BCNF. If we leave table in 3NF, we
    241 have non-key constraints - the FDs that are not implied by keys.
    242 
    243 #### Synthesis
    244 
    245 Produces lossless decomposition of relation into 3NF that preserves FDs.
    246 
    247 1.  Determine canonical set of FDs
    248     -   e.g. D→A, E→D, E→C, B→E
    249 2.  Merge FDs with same lhs and create relations from them
    250     -   R1(<u>D</u>, A)
    251     -   R2(<u>E</u>,C,D)
    252     -   R3(<u>B</u>,E)
    253 3.  Check if any of relations has key of original relation. If not,
    254     create a new relation with attributes of the minimal key.
    255     -   In this case, R(A,<u>B</u>,C,D,E); R3 has <u>B</u>, so don\'t
    256         need to do anything.
    257 4.  For all pairs of created relations: are they contained in another
    258     relation? If yes, remove.
    259 
    260 ## MVD & 4NF
    261 
    262 ### Multivalued dependencies (MVDs)
    263 
    264 Constraints that give a necessary and sufficient condition for lossless
    265 decomposition. They lead to fourth normal form (4NF).
    266 
    267 Multivalued dependency `NAME ⤅ PROG_LANG` means that set of vales in in
    268 column PROG_LANG associated with every NAME is independent of all other
    269 columns. i.e. there\'s an embedded function from NAME to sets of
    270 PROG_LANG
    271 
    272 The MVD holds if, whenever two tuples agree on NAME, one can exchange
    273 their PROG_LANG values and the resulting tuples are in the same table.
    274 
    275 MVDs always come in pairs. For relation R(A1,\...,An, B1,\...,Bm,
    276 C1,\...,Ck) these multivalued dependencies are equivalent:
    277 
    278 -   A1,\...,An ⤅ B1,\...,Bm
    279 -   A1,\...,An ⤅ C1,\...,Ck
    280 
    281 Every FD is also a MVD.
    282 
    283 ### Fourth Normal Form (4NF)
    284 
    285 A relation is in in 4NF if every MVD A1,\...,An ⤅ B1,\...,Bm is either
    286 trivial, or implied by a key.
    287 
    288 If a relation is in 4NF, it\'s also automatically in BCNF.
    289 
    290 ## Normal forms & ER design
    291 
    292 If a \'good\' ER schema is transformed into the relational model, the
    293 result will satisfy all normal forms (4NF, BCNF, 3NF). If a normal form
    294 is violated, there\'s a flaw in the input ER schema.
    295 
    296 In the ER model, the entity has to be split in case of a violation.
    297 
    298 Violations of BCNF can also be due to wrong placement of an attribute.
    299 
    300 If an attribute of a ternary relationship only depends on two of the
    301 entities, it\'s a BCNF violation.
    302 
    303 Why normalize?
    304 
    305 -   Avoid redundancy
    306 -   Store separate facts separately
    307 -   Transform general integrity constraints into keys (DBMS constraints)
    308 
    309 ## Denormalization
    310 
    311 The process of adding redundant columns to the database to improve
    312 performance.
    313 
    314 This leads to the reintroduction of (some) anomalies, but you avoid huge
    315 amounts of joins.
	lectures.alex.balgavy.eu Lecture notes from university.
	git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
	Log \| Files \| Refs \| Submodules