Relational Database Design: Comprehensive Guide
Relational database design is the cornerstone of effective database systems, focusing on organizing data efficiently while reducing redundancy and preserving data integrity. This article provides a thorough exploration of decomposition, normalization, functional dependencies, and keys, ensuring you have a complete understanding of relational database design principles.
Decomposition in Relational Database Design
Decomposition is the process of breaking a large relation (table) into smaller, meaningful relations to eliminate redundancy, improve consistency, and optimize performance. It is a critical aspect of normalization.
Types of Decomposition
-
Lossy Decomposition:
- A decomposition is lossy if the original table cannot be perfectly reconstructed by joining the decomposed relations.
- This happens when some data or relationships are lost during decomposition.
-
Example:
Consider the table:
EmployeeID | ProjectID | ProjectManager
---------------------------------------
E1 | P1 | M1
E2 | P1 | M1
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
If this is decomposed into:
- Table 1: EmployeeID | ProjectID
- Table 2: ProjectID | ProjectManager
Rejoining these tables can lead to duplicate or inconsistent data, resulting in a lossy decomposition.
-
Lossless Decomposition:
- A decomposition is lossless if the original table can be perfectly reconstructed by joining the decomposed relations without losing any data or introducing inconsistencies.
- This is achieved when the decomposition preserves all functional dependencies or when key attributes are included in each decomposed relation.
Functional Dependency
A functional dependency (FD) describes a relationship between two attributes in a relation where the value of one attribute (or set of attributes) determines the value of another attribute (or set of attributes). It is a fundamental concept in relational database design and normalization.
Definition:
Let X and Y be sets of attributes in a relation R. A functional dependency X → Y means that for any two tuples (rows) in R, if the tuples agree on the values of X, they must also agree on the values of Y.
-
X: Determinant (the attribute(s) on the left side).
-
Y: Dependent (the attribute(s) on the right side).
Example:
Consider a table storing student information:
StudentID | Name | Major
----------------------------
S1 | Alice | CS
S2 | Bob | EE
S3 | Alice | CS
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Here, StudentID → Name, Major because the StudentID uniquely determines both Name and Major.
Properties of Functional Dependencies:
-
Reflexivity: If Y is a subset of X, then X → Y.
-
Augmentation: If X → Y, then XZ → YZ (adding attributes to both sides preserves the dependency).
-
Transitivity: If X → Y and Y → Z, then X → Z.
Keys in Relational Databases
Keys are essential for identifying records uniquely in a table and enforcing data integrity.
Types of Keys:
-
Superkey:
- A set of one or more attributes that can uniquely identify a tuple in a relation.
- Example: In a table with attributes EmployeeID and Name, {EmployeeID}, {EmployeeID, Name} are superkeys.
-
Candidate Key:
- A minimal superkey, meaning no proper subset of it is also a superkey.
- Example: If {EmployeeID} can uniquely identify a tuple, it is a candidate key.
-
Primary Key:
- A candidate key chosen by the database designer to uniquely identify tuples.
- Example: EmployeeID in an Employee table.
-
Foreign Key:
- An attribute (or set of attributes) in one table that references the primary key in another table, establishing a relationship between the tables.
- Example: DepartmentID in an Employee table referencing DepartmentID in a Department table.
-
Composite Key:
- A primary key composed of two or more attributes.
- Example: (StudentID, CourseID) in a table of student enrollments.
-
Unique Key:
- A key constraint ensuring that all values in a column (or combination of columns) are unique.
Normalization and Normal Forms
Normalization is the process of organizing attributes and relations to reduce redundancy and dependency, ensuring data integrity. This is achieved by progressively meeting the criteria of successive normal forms.
Normal Forms (Comprehensive Overview)
First Normal Form (1NF)
Definition:
A relation is said to be in First Normal Form (1NF) if it satisfies the following criteria:
-
Atomicity: All attributes (columns) must contain atomic values. This means that the values in each column are indivisible and cannot be further broken down.
-
Single-Valued Entries: Each column in a table should contain values of a single data type, and no column should have sets, lists, or arrays.
-
Uniqueness of Rows: Each row must be unique, meaning the table should have a primary key to distinguish between rows.
-
No Repeating Groups: The table should not have multiple columns for the same attribute (like Item1, Item2, etc.), nor should it have multiple values stored in a single cell.
Explanation:
-
Atomic Values: Data in each cell must be in its simplest form. For example, instead of storing multiple items in one cell, each item should occupy its own row.
-
Repeating Groups: This is where multiple columns or rows represent the same type of data, making the table non-compliant with 1NF.
-
Primary Key: A primary key ensures that each row is uniquely identifiable, which is a fundamental requirement for relational databases.
Example:
Non-Compliant Table (Not in 1NF):
EmployeeID | ProjectID | ProjectManager
---------------------------------------
E1 | P1 | M1
E2 | P1 | M1
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
- The Items column violates atomicity because it contains multiple values (e.g., "Pen, Notebook").
- There are repeating groups because items are stored in a single cell rather than separate rows.
Compliant Table (In 1NF):
StudentID | Name | Major
----------------------------
S1 | Alice | CS
S2 | Bob | EE
S3 | Alice | CS
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
- Here, the Items column is broken down into atomic values, with each item in a separate row.
- No cell contains multiple values, ensuring atomicity.
- The table has no repeating groups or arrays, making it compliant with 1NF.
Second Normal Form (2NF)
Definition:
A relation is in Second Normal Form (2NF) if:
- It is already in First Normal Form (1NF) (i.e., no multi-valued or repeating groups).
- Every non-prime attribute is fully functionally dependent on the entire primary key.
-
Non-prime attribute: An attribute that is not part of any candidate key.
-
Fully functionally dependent: A non-prime attribute must depend on the entire composite primary key and not just a part of it.
Explanation:
- A partial dependency occurs when a non-prime attribute depends on only a part of a composite primary key, rather than the whole key.
-
2NF eliminates partial dependencies by decomposing the relation into smaller relations, ensuring that non-prime attributes are dependent only on the entire primary key or another candidate key.
This step reduces redundancy caused by partial dependencies and organizes the data better.
Example:
Non-Compliant Table (Not in 2NF):
Consider a table storing student-course information:
EmployeeID | ProjectID | ProjectManager
---------------------------------------
E1 | P1 | M1
E2 | P1 | M1
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
-
Composite Primary Key: (StudentID, CourseID).
-
Partial Dependency:
-
Instructor and Department depend only on CourseID and not on the entire primary key (StudentID, CourseID).
This violates 2NF because non-prime attributes (Instructor and Department) are partially dependent on the composite key.
Compliant Tables (In 2NF):
To remove the partial dependency, decompose the table into two relations:
-
Student-Course Table:
StudentID | Name | Major
----------------------------
S1 | Alice | CS
S2 | Bob | EE
S3 | Alice | CS
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
-
Course-Details Table:
OrderID | Items
-------------------
1 | Pen, Notebook
2 | Pencil
Copy after login
Copy after login
Copy after login
Copy after login
Third Normal Form (3NF)
Definition:
A relation is in Third Normal Form (3NF) if:
- It is in Second Normal Form (2NF) (i.e., no partial dependencies).
-
No transitive dependency exists, which means:
- No non-prime attribute is dependent on another non-prime attribute.
- A non-prime attribute should depend only on a candidate key, not through another non-prime attribute.
-
Non-prime attribute: An attribute that is not part of any candidate key.
-
Transitive dependency: A dependency where a non-prime attribute depends indirectly on a candidate key through another non-prime attribute.
Explanation:
In 3NF, we eliminate transitive dependencies to reduce redundancy and improve data consistency.
-
Transitive Dependency Example: If A → B and B → C, then A → C is a transitive dependency. This means C indirectly depends on A through B.
- Such dependencies introduce redundancy, as changes to B could lead to anomalies when updating C.
Example:
Non-Compliant Table (Not in 3NF):
OrderID | Item
---------------
1 | Pen
1 | Notebook
2 | Pencil
Copy after login
Copy after login
Copy after login
Candidate Key: StudentID uniquely identifies each row.
-
Issue: The HOD attribute depends on Department, not directly on StudentID.
-
StudentID → Department (Direct dependency).
-
Department → HOD (Transitive dependency).
- So, StudentID → HOD is a transitive dependency.
This structure leads to redundancy: if the HOD for the CS department changes, multiple rows need updating.
Compliant Tables (In 3NF):
To resolve the transitive dependency, decompose the table into two relations:
-
Student-Department Table:
EmployeeID | ProjectID | ProjectManager
---------------------------------------
E1 | P1 | M1
E2 | P1 | M1
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
-
Department-HOD Table:
StudentID | Name | Major
----------------------------
S1 | Alice | CS
S2 | Bob | EE
S3 | Alice | CS
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Boyce-Codd Normal Form (BCNF)
Definition:
A relation is in Boyce-Codd Normal Form (BCNF) if:
- It is in Third Normal Form (3NF) (i.e., no partial or transitive dependencies exist).
-
Every determinant is a candidate key.
-
Determinant: An attribute (or a set of attributes) on which another attribute is functionally dependent.
-
Candidate Key: A minimal set of attributes that can uniquely identify each tuple in a relation.
Key Difference Between 3NF and BCNF:
- While 3NF allows some dependencies where a non-prime attribute is functionally dependent on a candidate key, BCNF eliminates any such anomalies by ensuring that every determinant is a candidate key.
Explanation:
BCNF is stricter than 3NF and addresses situations where a relation may satisfy 3NF but still have redundancy caused by dependencies that violate BCNF.
When BCNF is Needed:
- BCNF is necessary when a non-candidate key attribute determines part of a candidate key, leading to redundancy and anomalies.
Example:
Non-Compliant Table (Not in BCNF):
OrderID | Items
-------------------
1 | Pen, Notebook
2 | Pencil
Copy after login
Copy after login
Copy after login
Copy after login
Functional Dependencies:
-
CourseID → Instructor
-
Instructor → Room
Candidate Key: CourseID
Issue:
- The determinant Instructor is not a candidate key but determines the Room.
- This violates BCNF, as not all determinants are candidate keys.
Compliant Tables (In BCNF):
To achieve BCNF, decompose the table into two relations:
-
Course-Instructor Table:
OrderID | Item
---------------
1 | Pen
1 | Notebook
2 | Pencil
Copy after login
Copy after login
Copy after login
-
Instructor-Room Table:
StudentID | CourseID | Instructor | Department
----------------------------------------------
S1 | C1 | Dr. Smith | CS
S2 | C2 | Dr. Jones | EE
S1 | C2 | Dr. Jones | EE
Copy after login
Fourth Normal Form (4NF)
Definition:
A relation is in Fourth Normal Form (4NF) if:
- It is in Boyce-Codd Normal Form (BCNF) (i.e., no partial, transitive, or other anomalies).
- It does not have any multi-valued dependencies.
-
Multi-valued Dependency (MVD): A multi-valued dependency exists when one attribute in a table determines multiple independent sets of attributes. In other words, if a relation contains two or more independent multi-valued attributes that are not related to each other, it violates 4NF.
Explanation:
In 4NF, the primary goal is to eliminate multi-valued dependencies, which occur when a record contains two or more independent attributes that are not directly related but appear together due to their dependence on the same key.
- These types of dependencies lead to redundancy because multiple copies of the same information are repeated in rows.
- By decomposing the relation to remove MVDs, we eliminate redundancy and improve consistency in the database.
Key Concept:
- In 4NF, a relation should not have two or more multi-valued attributes that depend on a candidate key. Each multi-valued dependency must be eliminated by decomposing the table appropriately.
Example:
Non-Compliant Table (Not in 4NF):
Consider a table that stores information about students, the courses they take, and the clubs they are involved in:
EmployeeID | ProjectID | ProjectManager
---------------------------------------
E1 | P1 | M1
E2 | P1 | M1
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Candidate Key: StudentID
Multi-Valued Dependencies:
- A StudentID can determine both a set of Courses and a set of Clubs, but these sets are independent of each other.
-
StudentID → {Courses} (Multi-valued dependency between StudentID and Courses)
-
StudentID → {Clubs} (Multi-valued dependency between StudentID and Clubs)
The table violates 4NF because StudentID determines both the courses and the clubs independently. This causes redundancy, as the same StudentID is repeated multiple times with different combinations of courses and clubs.
Compliant Tables (In 4NF):
To make the table comply with 4NF, we must eliminate the multi-valued dependencies by decomposing it into two tables:
-
Student-Course Table:
StudentID | Name | Major
----------------------------
S1 | Alice | CS
S2 | Bob | EE
S3 | Alice | CS
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
-
Student-Club Table:
OrderID | Items
-------------------
1 | Pen, Notebook
2 | Pencil
Copy after login
Copy after login
Copy after login
Copy after login
Now, the two multi-valued dependencies are handled separately:
- The Student-Course Table stores the relationship between students and the courses they take.
- The Student-Club Table stores the relationship between students and the clubs they are involved in.
Fifth Normal Form (5NF)
Definition:
A relation is in Fifth Normal Form (5NF), also known as Projection-Join Normal Form (PJNF), if:
- It is in Fourth Normal Form (4NF) (i.e., no multi-valued dependencies exist).
- It cannot be further decomposed without losing information, meaning that the relation does not contain any join dependency or lossless join decomposition.
-
Join Dependency (JD): A join dependency occurs when a relation can be decomposed into two or more relations, but when they are joined back together, no information is lost. In other words, a join dependency exists when a relation can be divided into sub-relations, but the original relation can be reconstructed without losing any data.
Explanation:
5NF deals with join dependencies, and it ensures that the data is decomposed in such a way that all information can be reconstructed from its decomposed parts without any loss of data. A relation in 5NF is designed in such a way that all of its non-trivial join dependencies are implied by its candidate keys.
-
Lossless Join Decomposition: When a relation is decomposed into smaller relations and then rejoined, the original relation can be fully reconstructed without any data loss. A relation is in 5NF if it cannot be further decomposed without causing a loss of information.
-
Non-trivial Join Dependency: A join dependency is non-trivial if the join dependency is not trivially satisfied (i.e., not all attributes from the relation are present in the join dependency).
In simpler terms, 5NF is concerned with ensuring that there is no redundancy caused by improper decompositions. It guarantees that when a relation is decomposed and later joined back, all of the original data is still available without any loss or ambiguity.
Example:
Non-Compliant Table (Not in 5NF):
Consider a table that stores information about which suppliers supply which parts for different projects:
EmployeeID | ProjectID | ProjectManager
---------------------------------------
E1 | P1 | M1
E2 | P1 | M1
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Candidate Key: (Supplier, Part, Project)
Join Dependency:
The relation above has a join dependency because it can be decomposed into smaller relations without losing information. For example, the table can be decomposed into three sub-relations:
-
Supplier-Part Table:
EmployeeID | ProjectID | ProjectManager
---------------------------------------
E1 | P1 | M1
E2 | P1 | M1
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
-
Supplier-Project Table:
StudentID | Name | Major
----------------------------
S1 | Alice | CS
S2 | Bob | EE
S3 | Alice | CS
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
-
Part-Project Table:
OrderID | Items
-------------------
1 | Pen, Notebook
2 | Pencil
Copy after login
Copy after login
Copy after login
Copy after login
By decomposing the table into these smaller relations, we can still recreate the original table by performing a natural join on these three smaller relations.
However, because this decomposition is possible, it violates 5NF. The reason it violates 5NF is because the information about which supplier supplies which part for a given project is redundantly stored across multiple rows. We are storing the same facts multiple times, which is unnecessary and could lead to inconsistencies.
Compliant Table (In 5NF):
To achieve 5NF, we decompose the table so that the relation cannot be decomposed further without losing information:
-
Supplier-Part-Project Table:
OrderID | Item
---------------
1 | Pen
1 | Notebook
2 | Pencil
Copy after login
Copy after login
Copy after login
In this form, the relation is now in 5NF because it cannot be decomposed further without losing data. This table represents the same information as the original but in a more normalized form where each attribute is fully dependent on the candidate key, and no redundancy exists due to improper decomposition.
Key Concepts in Relational Design
-
Multi-Valued Dependency: When one attribute determines multiple independent values.
-
Join Dependency: Ensures no spurious tuples are created during joins.
-
Dependency Preservation: Ensures all functional dependencies are preserved after decomposition.
This comprehensive guide equips you to master relational database design, ensuring efficient, consistent, and anomaly-free database systems.
The above is the detailed content of Relational Database Design: DBMS. For more information, please follow other related articles on the PHP Chinese website!