Which column for foreign key: id or any other column and why?
Foreign Key Best Practices: 'id' vs. Other Columns

Explore the best practices for choosing foreign key columns in relational databases, focusing on why using the primary key 'id' is generally preferred over other columns, and the implications for data integrity and performance.
When designing a relational database, one of the fundamental decisions involves how to establish relationships between tables. Foreign keys are crucial for this, linking records in one table to records in another. A common question arises: should a foreign key always reference the primary key (typically an id
column), or can it reference any unique column? This article delves into the reasons why referencing the primary key, especially an auto-incrementing id
, is almost always the superior choice, ensuring data integrity, performance, and maintainability.
The Role of Primary Keys and Foreign Keys
A primary key uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values. The most common primary key is an auto-incrementing integer column named id
. A foreign key, on the other hand, is a column or set of columns in one table that refers to the primary key (or a unique key) in another table. Its purpose is to enforce referential integrity, ensuring that relationships between tables remain consistent. For example, if you have a users
table and an orders
table, the orders
table would likely have a user_id
foreign key referencing the id
primary key in the users
table.
erDiagram USERS ||--o{ ORDERS : places USERS { INT id PK VARCHAR name VARCHAR email } ORDERS { INT id PK INT user_id FK DATE order_date DECIMAL total_amount }
Entity-Relationship Diagram showing USERS and ORDERS tables with a foreign key relationship.
Why 'id' (Primary Key) is the Preferred Foreign Key Target
Referencing the primary key (id
) for foreign key constraints offers several significant advantages:
Uniqueness and Immutability: Primary keys are guaranteed to be unique and, ideally, immutable. This ensures that the foreign key always points to a specific, unchanging record. If you were to reference a non-primary key column that might change (e.g., a
username
oremail
), updating that column would require cascading updates to all referencing foreign keys, which can be complex, error-prone, and performance-intensive.Performance: Database systems are highly optimized for primary key lookups. When a foreign key references a primary key, the database can efficiently join tables and enforce constraints. Primary keys are almost always indexed, making lookups extremely fast. Referencing a non-indexed column would lead to slower queries and constraint checks.
Simplicity and Consistency: Using
id
as the foreign key target simplifies schema design and makes it consistent across your database. Developers can easily understand relationships without needing to remember which specific unique column in a parent table is being referenced.Data Integrity: The primary key's non-null and unique constraints are the strongest guarantees of data integrity. By referencing it, you leverage these guarantees directly for your foreign key relationships.
Storage Efficiency: Auto-incrementing integer
id
columns are typically small and efficient to store and index, which translates to better performance and less disk usage compared to larger string-based unique keys.
id
as your primary key unless there's a compelling, well-justified reason not to. This simplifies foreign key relationships and optimizes performance.When Other Columns Might Be Considered (and Why to Be Cautious)
While generally discouraged, a foreign key can technically reference any column or set of columns in the parent table that has a UNIQUE
constraint. For example, if a users
table has a username
column with a UNIQUE
constraint, another table could theoretically use username
as a foreign key. However, this approach comes with significant drawbacks:
- Mutability Issues: If the
username
of a user changes, all records in tables referencing thatusername
would need to be updated. This requiresON UPDATE CASCADE
rules, which can be tricky to manage and can lead to performance bottlenecks during updates. - Performance Overhead: While a unique index helps, integer-based primary keys are often more efficient for joins and lookups than string-based unique keys.
- Semantic Meaning vs. Identifier: Primary keys are ideal as surrogate identifiers – they have no inherent meaning beyond identifying a record. Columns like
username
oremail
have semantic meaning and might change, making them poor choices for stable identifiers in relationships.
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- Preferred: Auto-incrementing integer primary key
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id), -- Correct: References the primary key 'id'
title VARCHAR(255) NOT NULL,
content TEXT
);
-- Example of referencing a unique non-PK column (generally discouraged)
CREATE TABLE user_profiles (
id SERIAL PRIMARY KEY,
user_email VARCHAR(100) REFERENCES users(email), -- Possible, but problematic if email changes
bio TEXT
);
SQL examples demonstrating foreign key creation, contrasting referencing 'id' vs. a unique 'email' column.
username
, email
, product_code
) as foreign key targets, even if they are unique. Their potential for change introduces complexity and risk to your database's referential integrity.Conclusion
The best practice for foreign key relationships is to always reference the primary key of the parent table, which is almost universally an auto-incrementing integer id
column. This approach guarantees uniqueness, immutability, optimal performance, and simplifies database design and maintenance. While databases allow foreign keys to reference any unique column, deviating from the primary key standard introduces unnecessary risks and complexities that can negatively impact data integrity and application performance in the long run.