NORMALIZATION IN DATABASE
normal forms for database normalization.
- Normalization is the process of organizing the data in the database.
- Normalization is used to minimize the redundancy from a relation or set of relations. it is also used to eliminate the undesirable characteristics like insertion,update and deletion anomalies.
- normalization divides the larger table into the smaller table and links them using relationship.
There are four type normalization which is used commonly:
- First normal form (1NF)
- Second normal form(2NF)
- Third normal form(3NF)
- Boyce codd normal form(BCNF)
All the normalization are based on the function dependencies that apply only to single-value fact.
Later on, fourth normal form (4NF) and a fifth normal form (5NF) were proposed based on the concept of multivalued dependencies and join dependencies respectively.
First normal form (1NF)
- A relation will be 1NF if it contains an atomic value.
- It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
Let’s an Example of 1NF
Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
Employee table:
To make the table complies with 1NF we should have the data like this:
Second normal form(2NF)
- In the 2NF, relational must be in 1NF.
- In the second normal form, all non-key attributes are fully functional dependent on the primary key
Let’s an example of second 2NF
A school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.
TEACHER table:
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That’s why it violates the rule for 2NF.
To make the table complies with 2NF we can break it in two tables like this:
Teacher_Detail table:
Teacher_Subject:
Third normal form(3NF)
- A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
- 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
- If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X → Y.
- X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Let’s example of Third normal form
Employee_Detail Table
Super key in the table above:
{EMP_ID}{EMP_ID, EMP_NAME} {EMP_ID,EMP_NAME,EMP_ZIP}
candidate key: {emp_id}
Non prime attribute:
In the given table, all attribute except EMP_ID are non prime. Here EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non prime attributes (EMP_STATE,EMP_CITY)transitively dependent on super key(EMP_ID).It violates the rule of third normal form.
That’s why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>table, with EMP_ZIP as a primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
Boyce Codd normal form (BCNF)
- BCNF is the advance version of 3NF. It is stricter than 3NF.
- A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
- For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let’s assume there is a company where employees work in more than one department.
EMPLOYEE table:
In the above table Functional dependencies are as follows:
EMP_ID →EMP_COUNTRY
EMP_DEPT →{DEPT_TYPE,EMP_DEPT_NO}
Candidate key:{EMP_ID,EMP_DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
TO convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_DEPT table:
EMP_DEPT_MAPPING table:
Functional dependencies:
EMP+ID →EMP_COUNTRY
EMP_DEPT →{DEPT_TYPE,EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now ,this is in BCNF because left side part of both the functional dependencies is a key.
Fourth Normal form(4NF)
- A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
- For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.
Let’s an Example of 4NF
Student
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF,we can decompose it into two tables:
Student_Course
Student_Hobby
Fifth normal form(5NF)
- A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
- 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
- 5NF is also known as Project-join normal form (PJ/NF).
Let’s an Example of 5NF
In the above table, John takes both Computer and Math class for Semester 1 but he doesn’t take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can’t leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
1 Comments
Nice
ReplyDelete