postgresql: INSERT INTO ... (SELECT * ...)
Efficient Data Transfer: Mastering INSERT INTO ... SELECT * ... in PostgreSQL

Learn how to use the INSERT INTO ... SELECT * ...
statement in PostgreSQL for efficient data copying, migration, and transformation between tables, including considerations for dblink
and schema differences.
The INSERT INTO ... SELECT * ...
statement is a powerful and frequently used SQL construct in PostgreSQL for moving data between tables. Whether you're migrating data, creating backups, populating new tables, or performing complex transformations, this command provides a robust and efficient way to achieve your goals. This article will delve into its various applications, best practices, and considerations, including how to handle schema mismatches and leverage dblink
for cross-database operations.
Basic Data Copying: The Foundation
At its core, INSERT INTO ... SELECT * ...
allows you to insert rows into a target table based on the results of a SELECT
query from another table. This is incredibly useful for creating copies of data or populating a new table with existing data. The simplest form involves copying all columns and rows from a source table to a target table with an identical schema.
INSERT INTO target_table
SELECT * FROM source_table;
Basic INSERT INTO ... SELECT * ...
for identical schemas
target_table
must have the exact same column names, data types, and order as source_table
. If there are any differences, you'll need to specify the columns explicitly.Handling Schema Differences and Column Selection
In real-world scenarios, it's rare for tables to have perfectly identical schemas. You might need to select specific columns, reorder them, or even transform data during the insertion process. PostgreSQL's INSERT INTO ... SELECT
syntax accommodates this by allowing you to explicitly list the target columns and the corresponding source columns in your SELECT
statement.
INSERT INTO target_table (column1, column2, column3)
SELECT source_col_a, source_col_b, source_col_c
FROM source_table
WHERE some_condition = 'value';
Inserting specific columns with a WHERE
clause
This approach provides much greater flexibility. You can also use functions, expressions, and CASE
statements within your SELECT
clause to transform data before it's inserted into the target table. This is particularly powerful for data cleansing or normalization tasks.
flowchart TD A[Source Table] --> B{SELECT Columns & Filter} B --> C[Transform Data (Optional)] C --> D[Target Table (INSERT INTO)] D --> E[Data Inserted]
Data flow for INSERT INTO ... SELECT
with transformations
Leveraging dblink for Cross-Database Operations
Sometimes, the data you need to insert resides in a different PostgreSQL database, potentially even on a different server. PostgreSQL's dblink
extension allows you to execute queries on remote PostgreSQL databases as if they were local. This makes INSERT INTO ... SELECT
even more versatile for cross-database data migration or synchronization.
dblink
, you must enable the extension in your PostgreSQL database. This typically involves running CREATE EXTENSION dblink;
as a superuser. Also, ensure proper network connectivity and authentication are configured between the databases.-- Enable dblink (if not already enabled)
CREATE EXTENSION dblink;
-- Insert data from a remote table into a local table
INSERT INTO local_table (id, name, value)
SELECT id, name, value
FROM dblink('host=remote_host user=remote_user password=remote_password dbname=remote_db',
'SELECT id, name, value FROM remote_table')
AS remote_data (id INT, name VARCHAR, value TEXT);
Using dblink
with INSERT INTO ... SELECT
for cross-database operations
When using dblink
, the SELECT
query is executed on the remote database, and its results are then returned to the local database for insertion. It's crucial to define the column list and their data types in the AS remote_data (...)
clause to match the expected output from the remote SELECT
query. This ensures that dblink
correctly interprets the incoming data.
Performance Considerations and Best Practices
While INSERT INTO ... SELECT
is generally efficient, large data volumes can impact performance. Here are some best practices to consider:
- Indexes: Ensure appropriate indexes exist on the
WHERE
clause columns of yourSELECT
statement to speed up data retrieval. - Transactions: Wrap your
INSERT
statement in a transaction. This allows you toROLLBACK
if something goes wrong and ensures atomicity. - Batching: For extremely large datasets, consider breaking the
INSERT
into smaller batches, especially if you're concerned about transaction log size or long-running locks. TRUNCATE
vs.DELETE
: If you're clearing a table before inserting new data,TRUNCATE TABLE
is generally faster thanDELETE FROM
as it doesn't log individual row deletions.CREATE TABLE AS
: If you're creating a new table and populating it,CREATE TABLE new_table AS SELECT * FROM old_table;
is often more efficient thanCREATE TABLE new_table; INSERT INTO new_table SELECT * FROM old_table;
as it avoids two separate operations and can optimize table creation based on the selected data.
-- Example of using CREATE TABLE AS
CREATE TABLE new_archive_table AS
SELECT id, name, created_at
FROM original_data
WHERE created_at < '2023-01-01';
-- Example of a transaction for safety
BEGIN;
INSERT INTO processed_data (col1, col2)
SELECT raw_col1, raw_col2_transformed
FROM raw_staging_data
WHERE status = 'processed';
-- If all goes well:
COMMIT;
-- If something goes wrong:
-- ROLLBACK;
Examples of CREATE TABLE AS
and transactional INSERT