Chapter 4Normalization1Data Normalization Formal process of decomposingrelations with anomalies toproduce smaller, wellstructured and stable relations Primarily a tool to validate andimprove a logical design so thatit satisfies certain constraintsthat avoid unnecessaryduplication of data21

Well-Structured Relations A relation that contains minimal data redundancyand allows users to insert, delete, and update rowswithout causing data inconsistencies Goal is to avoid (minimize) anomalies– Insertion Anomaly – adding new rows forces user tocreate duplicate data– Deletion Anomaly – deleting a row may cause loss ofother data representing completely different facts– Modification Anomaly – changing data in a row forceschanges to other rows because of duplicationGeneral rule of thumb: a table should not pertain tomore than one entity type3Example – Figure 4.2bQuestion – Is this a relation?Answer – Yes: unique rows and nomultivalued attributesQuestion – What’s the primary key?Answer – Composite: EmpID,CourseTitle42

Anomalies in this Table Insertion – can’t enter a new employee without havingthe employee take a class Deletion – if we remove employee 140, we loseinformation about the existence of a Tax Acc class Modification – giving a salary increase to employee100 forces us to update multiple recordsWhy do these anomalies exist?Because there are two themes (entity types – what are they?) in thisone relation (two themes, entity types, were combined). This resultsin duplication, and an unnecessary dependency between the entities5Figure: 4-22 Steps in NormalizationTable with MultivaluedattributesFirst normalform (1NF)Remove MultivaluedAttributesSecond normalform(2NF)Third normalform (3NF)Boyce-Codd normalform (BC-NF)Fourth normalForm (4NF)Fifth normalform (5NF)63

First Normal Form (1NF) Only atomic attributes (simple, single-value)A primary key has been identifiedEvery relation is in 1NF by definition1NF example:StudentStudentId100StuName CourseIdMike112CourseName GradeC 22224BAB7Figure: 4-22 Steps in NormalizationTable with MultivaluedattributesFirst normalform (1NF)Second normalform(2NF)Remove MultivaluedAttributesRemove PartialDependenciesThird normalform (3NF)Boyce-Codd normalform (BC-NF)Fourth normalForm (4NF)Fifth normalform (5NF)84

Functional Dependencies Functional Dependency: The value of one attribute (thedeterminant) determines the value of another attribute.– Aà B reads “Attribute B is functionally dependent on A”– Aà B means if two rows have same value of A theynecessarily have same value of B– FDs are determined by semantics: You can’t say that a FDexists just by looking at data. But can say whether it doesnot exist by looking at data.9Quick Check Id à Name?Age à Gender?Name à Id?Name, Age à Id?105

Functional Dependencies and Keys Functional Dependency: The value of one attribute (thedeterminant) determines the value of another attribute. Candidate Key– Attribute that uniquely identifies a row in a relation– Could be a combination of (non-redundant) attributes– Each non-key field is functionally dependent on everycandidate key12Figure 4-23: Representing Functional Dependencies (cont.)EmpID à EmpID, CourseTitle à 136

Practice Exercise #7, page #1931.2.3.Convert this table to a relation (named PART SUPPLIER) in 1NFDraw a relational schema for PART SUPPLIER and show thefunctional dependencies. Identify a candidate key.Identify each of the following: an insert anomaly, a delete anomaly, anda modification anomaly.14157

Second Normal Form (2NF) 1NF PLUS every non-key attribute isfully functionally dependent on theENTIRE primary key– Every non-key attribute must be defined bythe entire key, not by only part of the key– No partial functional dependencies16Functional Dependencies in StudentStudentIdStuNameCourseIdCourseNameGradeCan represent FDs with arrows as above, or StudentId à StuName, CourseId à CourseName StudentId,CourseId à Grade (and StuName,CourseName)Any partial FDs ?178

Functional Dependencies in StudentStudentIdStuNameCourseIdCourseNameGradeCan represent FDs with arrows as above, or StudentId à StuName, CourseId à CourseName StudentId,CourseId à Grade (and StuName,CourseName)Therefore, NOT in 2nd Normal Form!!182NF: Normalizing How do we convert the partial dependencies intonormal ones ? By breaking into more tables.StudentIdStuNameCourseIdCourseNameGrade Becomes (notice above arrows mean functional dependency,below they mean FK eIdCourseNameGrade199

You Try SeriesId EpisodeId SeriesTitle EpisodeTitle AiringDate List all FDs Eliminate partial FDs, if any20Figure: 4-22 Steps in NormalizationTable with MultivaluedattributesFirst normalform (1NF)Second normalform(2NF)Third normalform (3NF)Remove MultivaluedAttributesRemove PartialDependenciesRemove TransitiveDependenciesBoyce-Codd normalform (BC-NF)Fourth normalForm (4NF)Fifth normalform (5NF)2210

Third Normal Form 2NF and no transitive dependencies A transitive dependency is when a non-keyattribute depends on another non-keyattribute Note: This is called transitive, because theprimary key is a determinant for anotherattribute, which in turn is a determinant fora third attribute233NF ExampleCourseSectNum ClassroomCapacity Classroom à CapacityTRANSITIVE Any partial FDs?NO Any transitive FDs?YES ! How do we eliminate it? By breaking into its own table2511

3NF NormalizationCourseSectNum ClassroomClassroomCapacity27You Try StudentIdProgramIdStudentNameProgramName Partial FDs? Eliminate, if any. Transitive FDs? Eliminate, if any.2812

Practice Exercise #15, page #196Insertion anomaly?Deletion anomaly?Modification anomaly?1. Develop a diagram that shows the functional dependencies in theSHIPMENT relation.2. In what normal form is SHIPMENT? Why?3. Convert SHIPMENT to 3NF if necessary. Show the resultingtable(s) with the sample data presented in SHIPMENT.303113

Figure: 4-22 Steps in NormalizationTable with MultivaluedattributesFirst normalform (1NF)Second normalform(2NF)Third normalform (3NF)Boyce-Codd normalform (BC-NF)Remove MultivaluedAttributesRemove PartialDependenciesRemove TransitiveDependenciesRemove remaininganomalies resulting frommultiple candidate keysFourth normalForm (4NF)Fifth normalform (5NF)33Further Normalization Boyce-Codd Normal form (BCNF)– Slight difference with 3NF– To be in 3NF but not in BNF, needs two compositecandidate keys, with one attribute of one key dependingon one attribute of the other– Not very common J – If a table contains only one candidate key, the 3NF and theBCNF are equivalent. Fourth Normal Form (4NF)– To break it, need to have multivalued dependencies, ageneralization of functional dependencies Usually, if you’re in 3NF you’re in BCNF, 4NF, 3414

BCNF Example Assume that– For each subject, each student is taught by one Instructor– Each Instructor teaches only one subject– Each subject is taught by several InstructorsCourse, Student à InstructorCourseCS 121CS 121CS 121CS 121CourseCS 141CS 141CS 141CS 141CS 141InstructorDr. T. WatsonDr. T. WatsonDr. T. WatsonDr. P. HoldDr. P. HoldStudentLinda LeeJudith SanBill JonesBill PayneA. WhiteInstructorDr. A. JamesDr. A. JamesDr. A. JamesDr. A. JamesStudentBill PayneTony PerezJames AtkinsonLinda LeeCourseInstructor à CourseInstructorCS 101 Dr. M. JonesCS 101 Dr. M. JonesCS 101 Dr. M. JonesStudentLinda LeeTony PerezBill PayneBCNF: Decompose into (Instructor, Course) and (Student, Instructor)35BCNF Boyce-Codd normal form (BCNF)A relation is in BCNF, if and only if, everydeterminant is a candidate key. The difference between 3NF and BCNF is that fora functional dependency A à B, 3NF allows thisdependency in a relation if B is a primary-keyattribute and A is not a candidate key,whereas BCNF insists that for this dependency toremain in a relation, A must be a candidate key.3615

Figure: 4-22 Steps in NormalizationTable with MultivaluedattributesRemove MultivaluedAttributesFirst normalform (1NF)Remove PartialDependenciesSecond normalform(2NF)Remove Third normalform (3NF)Remove remaininganomalies resulting frommultiple candidate keysBoyce-Codd normalform (BC-NF)Remove MultivaluedDependenciesFourth normalForm (4NF)Fifth normalform (5NF)374NF A multi-valued dependency exists when– There are at least 3 attributes A, B, C in a relation and– For each value of A there is a well defined set of valuesfor B, and a well defined set of values for C,– But the set of values for B is independent on the set ofvalues for C 4NF 3NF with no multi-valued dependency3816

4NF Example Assume that– Each subject is taught by many Instructors– The same books are used in many subjects– Each Instructor uses a different bookCourse, Instructor à TextCourse, Text à InstructorCourseInstructorCS 121 Dr. A. JamesCS 121 Dr. P. HoldTextInt to Com ScienceComp Scien IntCourseInstructorCS 141 Dr. T. WatsonCS 141 Dr. P. HoldCS 101 Dr. M. JonesTextInt to Com ScienceComp Scien IntCOMP SCIEN4NF: Decompose into (Course, Instructor) and (Course, Text)39Textbook Example4017

The Normalization Example in the Text BookFigure 4-24 INVOICE (Pine Valley Furniture Company)41Figure 4-25 INVOICE DataTable with multivalued attributes, not in 1st normal formNote: this is NOT a relation. WHY?4218

Figure 4-26 INVOICE relation (1NF)Table with no multivalued attributes and unique rowsNote: this is relation, but not a well-structured one. WHY?43Anomalies in this Table Insertion–if new product is ordered for order 1007 ofexisting customer, customer data must be re-entered,causing duplication Deletion–if we delete the Dining Table from Order1006, we lose information concerning this item's finishand price Update–changing the price of product ID 4 requiresupdate in several records4419

Figure 4-27 Functional dependency diagram for INVOICEOrder ID è Order Date, Customer ID, Customer Name, Customer AddressCustomer ID è Customer Name, Customer AddressProduct ID è Product Description, Product Finish, Unit PriceOrder ID, Product ID è Order QuantityTherefore, NOT in 2nd Normal Form45Figure 4-28 Partial Dependencies were Removed (2NF)2NFPartial dependencies are removed, butthere are still transitive dependencies4620

Figure 4-29 Transitive Dependencies were Removed (3NF)Two RelationsRemainGetting it intoThird NormalForm3NF47You Try to 1NF RelationDraw dependency diagram showing all functional dependenciesIdentify anomaliesConvert to 3NF RelationsDevelop EER Diagram with appropriate cardinalities4821

49Logical Database DesignYou have just learnedand completed one ofthe most importantconcepts and theories,integrity constraintsand normalization,for developing aquality of database.5022

After learning one of most importantdatabase concepts and theories.WHAT’S NEXT ?51Steps of Database DevelopmentUser view-1User interview &Integrated ModelUser view-2User view-3 User view-N Conceptual Schema (Model)Logical Model(ERD or E/ERD)(Six) Relations TransformationNORMALIZATION (up to 3NF)(more relationsproduced)(more tablescreated)IMPLEMENTATION5223

101 Susan 222 Database A 140 Lorenzo 224 Graphics B Student 8 Table with Multivalued attributes First normal form (1NF) Second normal form(2NF) Boyce-Codd normal form (BC-NF) Fourth normal Form (4NF) Fifth normal form (5NF) Remove Multivalued Attributes Figure: 4-22 Steps