Boyce-Codd Normal Form
Categories:
Understanding Boyce-Codd Normal Form (BCNF) in Relational Databases
Explore Boyce-Codd Normal Form (BCNF), a stricter version of 3NF, to design robust and anomaly-free relational databases. Learn its rules, benefits, and how to achieve it.
Database normalization is a process used to organize a relational database to minimize data redundancy and improve data integrity. Boyce-Codd Normal Form (BCNF) is one of the higher forms of normalization, considered stricter than Third Normal Form (3NF). While 3NF addresses most common data anomalies, BCNF goes a step further to handle specific types of anomalies that can still exist in a 3NF relation, particularly when a table has multiple candidate keys that overlap.
What is Boyce-Codd Normal Form (BCNF)?
A relation is in BCNF if and only if for every non-trivial functional dependency (FD) X -> Y, X is a superkey. This means that if an attribute (or set of attributes) determines another attribute, then the determining attribute(s) must be a superkey of the relation. A superkey is any set of attributes that uniquely identifies a tuple (row) in a relation. A candidate key is a minimal superkey.
The primary difference between 3NF and BCNF lies in how they handle functional dependencies where the determinant is not a superkey. 3NF allows for non-key attributes to be dependent on other non-key attributes (transitive dependency) and for non-key attributes to be dependent on part of a candidate key (partial dependency). BCNF eliminates these more subtle dependencies, ensuring that every determinant in a functional dependency is a superkey.
erDiagram STUDENT ||--o{ COURSE_ENROLLMENT : "enrolls in" COURSE_ENROLLMENT ||--|| COURSE : "is part of" COURSE ||--o{ INSTRUCTOR : "taught by" STUDENT { VARCHAR student_id PK VARCHAR student_name VARCHAR major } COURSE { VARCHAR course_id PK VARCHAR course_name VARCHAR instructor_id FK } INSTRUCTOR { VARCHAR instructor_id PK VARCHAR instructor_name VARCHAR department } COURSE_ENROLLMENT { VARCHAR student_id PK,FK VARCHAR course_id PK,FK VARCHAR grade }
Example ER Diagram for a Student-Course-Instructor System
Rules for BCNF
To be in BCNF, a relation must satisfy the following conditions:
- It must be in 3NF: This is a prerequisite. All conditions for 1NF, 2NF, and 3NF must already be met.
- For every non-trivial functional dependency X -> Y, X must be a superkey: This is the core rule. If X determines Y, then X must uniquely identify every row in the table. This implies that there are no non-key attributes determining other non-key attributes, nor are there non-key attributes determining part of a candidate key, nor are there parts of a candidate key determining other parts of a candidate key (unless that part is itself a superkey).
BCNF is particularly important when a table has multiple overlapping candidate keys. In such cases, 3NF might not be sufficient to eliminate all redundancy and update anomalies.
Example: Achieving BCNF
Consider a Student_Course_Instructor
table with the following attributes and functional dependencies:
Student_Course_Instructor (StudentID, CourseName, InstructorName)
Functional Dependencies:
StudentID, CourseName -> InstructorName
(A student taking a specific course is taught by a specific instructor)InstructorName -> CourseName
(An instructor teaches only one course)
Let's analyze this table:
- Candidate Key:
(StudentID, CourseName)
is a candidate key because it uniquely identifies an instructor. - Problem: The dependency
InstructorName -> CourseName
violates BCNF becauseInstructorName
is not a superkey. It's a non-key attribute that determines another non-key attribute (CourseName
). This also meansCourseName
is transitively dependent onStudentID
viaInstructorName
if we considerStudentID
as part of a key for a broader student-centric view.
To achieve BCNF, we decompose the table:
Student_Course (StudentID, CourseName)
- Candidate Key:
(StudentID, CourseName)
- All FDs:
(StudentID, CourseName) -> {}
(trivial) - This table is in BCNF.
- Candidate Key:
Instructor_Course (InstructorName, CourseName)
- Candidate Key:
InstructorName
(sinceInstructorName
determinesCourseName
) - All FDs:
InstructorName -> CourseName
- This table is in BCNF.
- Candidate Key:
CREATE TABLE Student_Course_Instructor (
StudentID VARCHAR(10) NOT NULL,
CourseName VARCHAR(50) NOT NULL,
InstructorName VARCHAR(50) NOT NULL,
PRIMARY KEY (StudentID, CourseName)
);
-- This table violates BCNF due to: InstructorName -> CourseName
-- Decomposition to BCNF:
CREATE TABLE Student_Course (
StudentID VARCHAR(10) NOT NULL,
CourseName VARCHAR(50) NOT NULL,
PRIMARY KEY (StudentID, CourseName)
);
CREATE TABLE Instructor_Course (
InstructorName VARCHAR(50) NOT NULL,
CourseName VARCHAR(50) NOT NULL,
PRIMARY KEY (InstructorName)
);
SQL schema demonstrating BCNF decomposition