Create a pivot table with PostgreSQL
Creating Pivot Tables in PostgreSQL
Learn how to transform rows into columns using SQL aggregate functions and the CROSSTAB
function in PostgreSQL to create powerful pivot tables for data analysis.
Pivot tables are a fundamental tool in data analysis, allowing you to summarize and reorganize data to gain new insights. In PostgreSQL, while there isn't a direct PIVOT
keyword like in some other SQL dialects, you can achieve the same powerful functionality using a combination of aggregate functions with CASE
statements or, for more complex scenarios, the CROSSTAB
function provided by the tablefunc
extension. This article will guide you through both approaches, demonstrating how to effectively create pivot tables in PostgreSQL.
Understanding the Need for Pivoting
Imagine you have sales data where each row represents a transaction, including the product, region, and sales amount. If you want to see total sales for each product across different regions, with regions as columns, a pivot table is exactly what you need. It transforms the unique values from one column (e.g., 'Region') into new columns, and then aggregates values (e.g., 'Sales Amount') for the intersections.
flowchart LR A[Raw Data] --> B{"Identify Pivot Column"} B --> C{"Identify Value Column"} C --> D{"Identify Grouping Column"} D --> E[Apply Aggregation & Transformation] --> F[Pivot Table Result]
Conceptual flow for creating a pivot table
Method 1: Using CASE
Statements with Aggregate Functions
This is the most common and flexible way to create pivot tables in PostgreSQL without relying on extensions. It involves using aggregate functions (like SUM
, COUNT
, AVG
) in conjunction with CASE
expressions to conditionally sum or count values based on the column you want to pivot. This method is excellent for a fixed number of columns you want to pivot.
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
product VARCHAR(50) NOT NULL,
region VARCHAR(50) NOT NULL,
sale_amount DECIMAL(10, 2) NOT NULL
);
INSERT INTO sales (product, region, sale_amount) VALUES
('Laptop', 'East', 1200.00),
('Mouse', 'East', 25.00),
('Keyboard', 'East', 75.00),
('Laptop', 'West', 1100.00),
('Mouse', 'West', 30.00),
('Keyboard', 'West', 80.00),
('Laptop', 'North', 1300.00),
('Mouse', 'North', 20.00),
('Keyboard', 'North', 70.00),
('Monitor', 'East', 300.00),
('Monitor', 'West', 320.00),
('Monitor', 'North', 310.00);
Sample sales
table and data for demonstration
SELECT
product,
SUM(CASE WHEN region = 'East' THEN sale_amount ELSE 0 END) AS "East Sales",
SUM(CASE WHEN region = 'West' THEN sale_amount ELSE 0 END) AS "West Sales",
SUM(CASE WHEN region = 'North' THEN sale_amount ELSE 0 END) AS "North Sales",
SUM(sale_amount) AS "Total Sales"
FROM
sales
GROUP BY
product
ORDER BY
product;
Pivoting sales data by region using CASE
statements
In this example, product
is our grouping column, region
is the column we want to pivot, and sale_amount
is the value we are aggregating. Each CASE
statement creates a new column for a specific region, summing the sale_amount
only when the region
matches. If it doesn't match, it contributes 0
to the sum, effectively isolating sales per region.
CASE
statement approach is ideal when the number of distinct values in your pivot column (e.g., regions) is known and relatively small. For a dynamic number of columns, you would typically need to generate the SQL query dynamically in your application layer.Method 2: Using the CROSSTAB
Function (tablefunc extension)
For more advanced pivoting, especially when dealing with a dynamic number of columns or when you need a more concise syntax, PostgreSQL offers the CROSSTAB
function as part of the tablefunc
extension. This extension needs to be enabled in your database first.
CREATE EXTENSION IF NOT EXISTS tablefunc;
Enabling the tablefunc
extension
The CROSSTAB
function comes in two main forms. The most common one takes two arguments: a "source query" and a "category query".
- Source Query: This query should return three columns:
row_name
,category
, andvalue
.row_name
will be your grouping column,category
will be the column whose distinct values become new columns, andvalue
is the aggregated value. - Category Query: This query should return the exact set of distinct categories (pivot column values) that you expect to see as columns in your output. This is crucial for
CROSSTAB
to know what columns to create and in what order.
SELECT * FROM crosstab(
'SELECT product, region, SUM(sale_amount) FROM sales GROUP BY product, region ORDER BY 1,2',
'SELECT DISTINCT region FROM sales ORDER BY 1'
) AS ct (product VARCHAR(50), "East" DECIMAL(10,2), "North" DECIMAL(10,2), "West" DECIMAL(10,2));
Pivoting sales data using CROSSTAB
Let's break down the CROSSTAB
query:
- Source Query:
SELECT product, region, SUM(sale_amount) FROM sales GROUP BY product, region ORDER BY 1,2
product
: The row identifier.region
: The column whose values will become new columns.SUM(sale_amount)
: The aggregated value for eachproduct
/region
combination.
- Category Query:
SELECT DISTINCT region FROM sales ORDER BY 1
- This explicitly tells
CROSSTAB
to expect 'East', 'North', and 'West' as the pivot columns, in that order.
- This explicitly tells
- Alias (
AS ct (...)
): You must provide an alias for thecrosstab
function and explicitly define the column names and their data types for the pivoted output. The first column is always therow_name
from your source query, followed by the columns for each category specified in your category query.
AS ct (...)
clause must exactly match the order and types of the columns returned by your crosstab
function. If a category from the category query is not present in the source query for a given row, its corresponding pivoted column will contain NULL
.Dynamic Pivoting with CROSSTAB
(Advanced)
While CROSSTAB
requires you to explicitly list the output columns, you can combine it with dynamic SQL (e.g., using PL/pgSQL or your application's programming language) to generate the crosstab
call and its column definitions on the fly. This is useful when the categories you want to pivot are not known beforehand or change frequently.
DO $$
DECLARE
_sql TEXT;
_cols TEXT;
BEGIN
-- Generate the column list dynamically
SELECT string_agg(quote_ident(region) || ' DECIMAL(10,2)', ', ' ORDER BY region)
INTO _cols
FROM (SELECT DISTINCT region FROM sales) AS regions;
-- Construct the full crosstab query
_sql := format(
'SELECT product, %s FROM crosstab(
''SELECT product, region, SUM(sale_amount) FROM sales GROUP BY product, region ORDER BY 1,2'',
''SELECT DISTINCT region FROM sales ORDER BY 1''
) AS ct (product VARCHAR(50), %s);',
_cols, _cols
);
-- Execute the dynamic query
EXECUTE _sql;
END $$;
Dynamic CROSSTAB
query using PL/pgSQL
This PL/pgSQL block first constructs the column definitions (_cols
) by querying the distinct regions. Then, it uses format()
to inject these dynamic column definitions into the crosstab
query string, which is finally executed. This allows for truly dynamic pivot tables where the number and names of the pivoted columns are determined at runtime.