postgresql: INSERT INTO ... (SELECT * ...)

Learn postgresql: insert into ... (select * ...) with practical examples, diagrams, and best practices. Covers sql, postgresql, insert development techniques with visual explanations.

Efficient Data Transfer: Mastering INSERT INTO ... SELECT * ... in PostgreSQL

Hero image for postgresql: INSERT INTO ... (SELECT * ...)

Learn how to use the INSERT INTO ... SELECT * ... statement in PostgreSQL for efficient data copying, migration, and transformation between tables, including considerations for dblink and schema differences.

The INSERT INTO ... SELECT * ... statement is a powerful and frequently used SQL construct in PostgreSQL for moving data between tables. Whether you're migrating data, creating backups, populating new tables, or performing complex transformations, this command provides a robust and efficient way to achieve your goals. This article will delve into its various applications, best practices, and considerations, including how to handle schema mismatches and leverage dblink for cross-database operations.

Basic Data Copying: The Foundation

At its core, INSERT INTO ... SELECT * ... allows you to insert rows into a target table based on the results of a SELECT query from another table. This is incredibly useful for creating copies of data or populating a new table with existing data. The simplest form involves copying all columns and rows from a source table to a target table with an identical schema.

INSERT INTO target_table
SELECT * FROM source_table;

Basic INSERT INTO ... SELECT * ... for identical schemas

Handling Schema Differences and Column Selection

In real-world scenarios, it's rare for tables to have perfectly identical schemas. You might need to select specific columns, reorder them, or even transform data during the insertion process. PostgreSQL's INSERT INTO ... SELECT syntax accommodates this by allowing you to explicitly list the target columns and the corresponding source columns in your SELECT statement.

INSERT INTO target_table (column1, column2, column3)
SELECT source_col_a, source_col_b, source_col_c
FROM source_table
WHERE some_condition = 'value';

Inserting specific columns with a WHERE clause

This approach provides much greater flexibility. You can also use functions, expressions, and CASE statements within your SELECT clause to transform data before it's inserted into the target table. This is particularly powerful for data cleansing or normalization tasks.

flowchart TD
    A[Source Table] --> B{SELECT Columns & Filter}
    B --> C[Transform Data (Optional)]
    C --> D[Target Table (INSERT INTO)]
    D --> E[Data Inserted]

Data flow for INSERT INTO ... SELECT with transformations

Sometimes, the data you need to insert resides in a different PostgreSQL database, potentially even on a different server. PostgreSQL's dblink extension allows you to execute queries on remote PostgreSQL databases as if they were local. This makes INSERT INTO ... SELECT even more versatile for cross-database data migration or synchronization.

-- Enable dblink (if not already enabled)
CREATE EXTENSION dblink;

-- Insert data from a remote table into a local table
INSERT INTO local_table (id, name, value)
SELECT id, name, value
FROM dblink('host=remote_host user=remote_user password=remote_password dbname=remote_db',
            'SELECT id, name, value FROM remote_table')
AS remote_data (id INT, name VARCHAR, value TEXT);

Using dblink with INSERT INTO ... SELECT for cross-database operations

When using dblink, the SELECT query is executed on the remote database, and its results are then returned to the local database for insertion. It's crucial to define the column list and their data types in the AS remote_data (...) clause to match the expected output from the remote SELECT query. This ensures that dblink correctly interprets the incoming data.

Performance Considerations and Best Practices

While INSERT INTO ... SELECT is generally efficient, large data volumes can impact performance. Here are some best practices to consider:

  • Indexes: Ensure appropriate indexes exist on the WHERE clause columns of your SELECT statement to speed up data retrieval.
  • Transactions: Wrap your INSERT statement in a transaction. This allows you to ROLLBACK if something goes wrong and ensures atomicity.
  • Batching: For extremely large datasets, consider breaking the INSERT into smaller batches, especially if you're concerned about transaction log size or long-running locks.
  • TRUNCATE vs. DELETE: If you're clearing a table before inserting new data, TRUNCATE TABLE is generally faster than DELETE FROM as it doesn't log individual row deletions.
  • CREATE TABLE AS: If you're creating a new table and populating it, CREATE TABLE new_table AS SELECT * FROM old_table; is often more efficient than CREATE TABLE new_table; INSERT INTO new_table SELECT * FROM old_table; as it avoids two separate operations and can optimize table creation based on the selected data.
-- Example of using CREATE TABLE AS
CREATE TABLE new_archive_table AS
SELECT id, name, created_at
FROM original_data
WHERE created_at < '2023-01-01';

-- Example of a transaction for safety
BEGIN;
INSERT INTO processed_data (col1, col2)
SELECT raw_col1, raw_col2_transformed
FROM raw_staging_data
WHERE status = 'processed';
-- If all goes well:
COMMIT;
-- If something goes wrong:
-- ROLLBACK;

Examples of CREATE TABLE AS and transactional INSERT