index.md (10771B)
1 +++ 2 title = "Relational normal forms" 3 template = "page-math.html" 4 +++ 5 6 # Relational normal forms 7 8 ## Functional dependencies 9 10 Functional dependencies are a generalization of keys. This theory 11 defines when a relation is in normal form. 12 13 functional dependency: if two rows agree on a value in one column, they 14 must also agree on the other column e.g. here, functional dependency is 15 `INAME → PHONE`, because phone number only depends on the instructor 16 intuitively: 17 18 ![FD table example](fd-table-example.png) 19 20 This is read as \"`INAME` (functionally, uniquely) determines `PHONE`\". 21 22 ### What is a functional dependency? 23 24 - like partial key, because it uniquely determines some attributes but 25 not all 26 - a constraint 27 - A determinant is a \'minimal\' functional dependency. 28 - goal of database normalization is to turn functional dependencies 29 into keys 30 31 Keys are functional dependencies. 32 33 During database design, only unquestionable conditions should be used as 34 functional dependencies. 35 36 It\'s usually bad database design if schema\'s relations violate normal 37 form. If it\'s violated, data is stored redundantly and information 38 about different concepts is intermixed. 39 40 ### Working with functional dependencies 41 42 The database designer is not interested in all functional dependencies, 43 but only in a representative functional dependency set that implies all 44 others. 45 46 Implications of functional dependencies: 47 48 - If A→B and B→C, then A→C 49 - A→A always holds 50 - Armstrong axioms: 51 - If β ⊆ α, then α → β (reflexivity) 52 - If α → β, then α ∪ γ → β ∪ γ (augmentation) 53 - If α → β and β → γ, then α → γ (transitivity) 54 55 Computing the cover: for given set of attributes, see which they imply 56 through FDs. Extended cover with those, and repeat. 57 58 - Cover of A ({A}+)? FDs A→B,C; B→E 59 - {A}+ ⇒ {A,B,C} ⇒ {A,B,C,E} 60 61 Checking whether a → β is implied by a functional dependency set: 62 63 1. computer cover α⁺ of α: 64 2. check if β ⊆ α⁺: 65 - set of functional dependencies F implies α → β iff β ⊆ α⁺\_F 66 67 example: 68 69 ![Example to check functional dependency](example-to-check-fd.png) 70 71 ### How to determine keys 72 73 Determining a minimal key (slides 64-70) 74 75 - given: 76 - R(A,B,C,D) 77 - FDs: A→C; C→B,D 78 - start with all attributes: {A,B,C,D} 79 - For every attribute, see if we can remove it, which is possible if 80 they are implied by an FD which is still in the attribute set. 81 - {A,B,C,D} ⇒ {A,C} (FD2) ⇒ {A} (FD1) 82 - thus {A} is a minimal key 83 - order matters, you can end up with different keys! 84 85 determining all minimal keys (slides 71-88) 86 87 - given: 88 - R(A,B,C,D) 89 - FDs: A→C; C→B,D 90 - Start with set of candidates: attributes that imply and are not 91 implied (not in any right hand side) 92 - { {A} } 93 - Find cover of smallest candidate key 94 - {A} ⇒ {A,C} ⇒ {A,B,C,D} 95 - If does not contain all attributes, extend candidate with missing 96 others and repeat. 97 - edge case: if there are no attributes that aren\'t in any right hand 98 side, then candidates is empty set { {} } and you extend with every 99 attribute like { {A}, {B}, \...} 100 101 ### Determinants 102 103 determinant: non-trivial, minimal functional dependency 104 105 {A1, \..., An} is determinant for {B1, \..., Bm} if: 106 107 - functional dependency A1, \..., An → B1, \..., Bm holds; and 108 - left-hand side is minimal (if any Ai is removed, then it does not 109 hold); and 110 - it is non-trivial, i.e. {B1, \..., Bm} not subset of {A1, \..., An} 111 112 ### Consequences of bad database design 113 114 usually if table contains an functional dependency that\'s not implied 115 by a key, it\'s a sign of bad database design. 116 117 leads to: 118 119 - redundant storage of certain facts 120 - wastes storage space 121 - hard to ensure integrity when updating, as all redundant copies 122 need to be updated, wasting time 123 - requires additional constraints to guarantee integrity 124 - insert, update, deletion anomalies 125 - update: when a single value needs to be changed, multiple tuples 126 need to be updated, taking longer and maybe getting out of sync 127 - insertion: when unrelated concepts are stored together in a 128 single table 129 - deletion: e.g. when last course of instructor is deleted, their 130 phone number is lost 131 132 problem is that general FDs are not supported by relational databases. 133 so you have to transform them into key constraints (database 134 normalisation). 135 136 ## Normal forms 137 138 Normal form types: 139 140 - Third Normal Form (3NF): standard relational normal form used in 141 practice 142 - Boyce-Codd Normal Form (BCNF): 143 - a bit more restrictive, easier to define, better for intuition 144 - BCNF requires that all functional dependencies are keys. 145 - ensures that key constraints automatically satisfy all FDs, so 146 no more constraints are needed 147 - anomalies (update/insertion/deletion) don\'t occur 148 149 Normalisation algorithms can construct good relation schemas from 150 attributes and functional dependencies. When an ER model is well 151 designed, resulting derived relational tables will automatically be in 152 BCNF. 153 154 First normal form (1NF): 155 156 - requires all table entries are atomic (not lists, sets, records, or 157 relations) 158 - all further normal forms assume that tables are in 1NF 159 160 ### Boyce-Codd Normal Form (BCNF) 161 162 - if all of its FDs are implied by its key constraints 163 - in symbols: 164 - for every FD A1,\...,An → B1,\...,Bm of R, we have 165 - either {B1,\...,Bm} ⊆ {A1,\...,An} (the FD is trivial) 166 - or {A1,\...,An} contains a key of R 167 - in short, if for every non-trivial functional dependency, left-hand 168 side contains a key 169 170 > ![Example of checking BCNF](example-of-checking-bcnf.png) 171 172 #### Splitting relations 173 174 If table R is not in BCNF, we can split it into two tables. You split 175 based on violating FD. 176 177 Table decomposition: 178 179 - if FD A1,\...,An → B1,\...,Bm violates BCNF: 180 1. create new relation S(A1,\...,An,B1,\...,Bm) 181 2. and remove B1,\...,Bm from original relation R 182 183 Splitting has to be 184 []{#Relational normal forms-Normal forms-Boyce-Codd Normal Form (BCNF)-Splitting relations-lossless}**lossless** 185 so that you can reconstruct original relation by a join. 186 187 Decomposition theorem: split is guaranteed to be lossless if 188 intersection of attributes of new tables is a key of at least one of 189 them. 190 191 It\'s always possible to transform relation into BCNF by lossless 192 splitting. The resulting schema can always represent all previously 193 possible states, but it may be more general and allow states that do not 194 exist in the old schema. 195 196 With computable columns, splitting the relation is not the right 197 solution - instead, define a view with the computed column. 198 199 A good decomposition should guarantee preservation of FDs: 200 201 - and FD can refer only to attributes of a single relation 202 - when splitting relation into two, there might be FDs that can\'t be 203 expressed anymore (not preserved) 204 205 #### Synthesis 206 207 Determining canonical (minimal) set of FDs: 208 209 - given: 210 - R(A,B,C,D,E) 211 - FDs: D→A; E→A,D; C,D→A; A,E→C; B→A,D,E 212 - Rewrite every FD as singular: 213 - D→A, E→A, E→D, C,D→A, A,E→C, B→A, B→D, B→E 214 - Minimise left hand side of every FD (aka is every FD minimal?) 215 - drop C from C,D→A because D→A 216 - drop A from A,E→C because E→A 217 - so: D→A, E→A, E→D, D→A, E→C, B→A, B→D, B→E 218 - Remove implied FDs (and trivial/duplicate) using lhs attributes (aka 219 if we can determine rhs without the FD itself) 220 - D→A, E→D, E→C, B→E 221 222 BCNF synthesis (relation R, set of FDs for R): 223 224 1. Determine canonical set of FDs 225 - e.g. D→A, E→D, E→C, B→E 226 2. Maximise rhs of FDs 227 - {D}+ - D = {A} 228 - \\({E}\_{-E}\^+\\) = {D,C,A} 229 - \\({B}\_{-B}\^+\\) = {E,D,C,A} ({B} is the minimal key in this 230 case) 231 3. Split on violating FDs. For each FD, remove the rhs from relations 232 and add a new relation, with the lhs of that FD being the key. 233 234 ### Third normal form (3NF) 235 236 - relation is in 3NF if for every non-trivial functional dependency: 237 - left-hand side contains a key 238 - or right-hand side is attribute of minimal key 239 240 retains all FDs, so more popular than BCNF. If we leave table in 3NF, we 241 have non-key constraints - the FDs that are not implied by keys. 242 243 #### Synthesis 244 245 Produces lossless decomposition of relation into 3NF that preserves FDs. 246 247 1. Determine canonical set of FDs 248 - e.g. D→A, E→D, E→C, B→E 249 2. Merge FDs with same lhs and create relations from them 250 - R1(<u>D</u>, A) 251 - R2(<u>E</u>,C,D) 252 - R3(<u>B</u>,E) 253 3. Check if any of relations has key of original relation. If not, 254 create a new relation with attributes of the minimal key. 255 - In this case, R(A,<u>B</u>,C,D,E); R3 has <u>B</u>, so don\'t 256 need to do anything. 257 4. For all pairs of created relations: are they contained in another 258 relation? If yes, remove. 259 260 ## MVD & 4NF 261 262 ### Multivalued dependencies (MVDs) 263 264 Constraints that give a necessary and sufficient condition for lossless 265 decomposition. They lead to fourth normal form (4NF). 266 267 Multivalued dependency `NAME ⤅ PROG_LANG` means that set of vales in in 268 column PROG_LANG associated with every NAME is independent of all other 269 columns. i.e. there\'s an embedded function from NAME to sets of 270 PROG_LANG 271 272 The MVD holds if, whenever two tuples agree on NAME, one can exchange 273 their PROG_LANG values and the resulting tuples are in the same table. 274 275 MVDs always come in pairs. For relation R(A1,\...,An, B1,\...,Bm, 276 C1,\...,Ck) these multivalued dependencies are equivalent: 277 278 - A1,\...,An ⤅ B1,\...,Bm 279 - A1,\...,An ⤅ C1,\...,Ck 280 281 Every FD is also a MVD. 282 283 ### Fourth Normal Form (4NF) 284 285 A relation is in in 4NF if every MVD A1,\...,An ⤅ B1,\...,Bm is either 286 trivial, or implied by a key. 287 288 If a relation is in 4NF, it\'s also automatically in BCNF. 289 290 ## Normal forms & ER design 291 292 If a \'good\' ER schema is transformed into the relational model, the 293 result will satisfy all normal forms (4NF, BCNF, 3NF). If a normal form 294 is violated, there\'s a flaw in the input ER schema. 295 296 In the ER model, the entity has to be split in case of a violation. 297 298 Violations of BCNF can also be due to wrong placement of an attribute. 299 300 If an attribute of a ternary relationship only depends on two of the 301 entities, it\'s a BCNF violation. 302 303 Why normalize? 304 305 - Avoid redundancy 306 - Store separate facts separately 307 - Transform general integrity constraints into keys (DBMS constraints) 308 309 ## Denormalization 310 311 The process of adding redundant columns to the database to improve 312 performance. 313 314 This leads to the reintroduction of (some) anomalies, but you avoid huge 315 amounts of joins.