SAS PROC Transpose Data
Categories:
Mastering Data Reshaping with SAS PROC TRANSPOSE

Learn how to effectively use SAS PROC TRANSPOSE to convert data from wide to long format and vice versa, a crucial skill for data analysis and reporting.
Data often comes in various formats, and sometimes, the structure isn't ideal for the analysis or reporting you need to perform. SAS's PROC TRANSPOSE
is a powerful procedure designed to reshape your data, converting rows into columns or columns into rows. This process, known as transposing, is fundamental for many statistical analyses, data visualizations, and database interactions. This article will guide you through the core functionalities of PROC TRANSPOSE
, illustrating its use with practical examples and best practices.
Understanding Wide vs. Long Data Formats
Before diving into PROC TRANSPOSE
, it's essential to understand the concepts of 'wide' and 'long' data formats. These terms describe how repeated measurements or categories are stored within a dataset.
Wide Format: In a wide format, each row represents a single observational unit, and repeated measurements or categories are stored in separate columns. For example, if you have monthly sales data for different products, a wide format might have columns like Product
, Jan_Sales
, Feb_Sales
, etc.
Long Format: In a long format, each row represents a single observation of a variable. Repeated measurements or categories are stored in a single column, with another column indicating the type of measurement or category. Using the sales example, a long format would have columns like Product
, Month
, Sales
.
flowchart LR A[Original Data (Wide)] --> B{PROC TRANSPOSE} B --> C[Transposed Data (Long)] C --> D{PROC TRANSPOSE (Reverse)} D --> E[Original Data (Wide)] subgraph Wide Format W1["ID | Var1 | Var2 | Var3"] W2["1 | 10 | 20 | 30"] W3["2 | 15 | 25 | 35"] end subgraph Long Format L1["ID | _NAME_ | _LABEL_ | COL1"] L2["1 | Var1 | Var1 | 10"] L3["1 | Var2 | Var2 | 20"] L4["1 | Var3 | Var3 | 30"] L5["2 | Var1 | Var1 | 15"] L6["2 | Var2 | Var2 | 25"] L7["2 | Var3 | Var3 | 35"] end W1 -- Example --> W2 W2 -- Example --> W3 L1 -- Example --> L2 L2 -- Example --> L3 L3 -- Example --> L4 L4 -- Example --> L5 L5 -- Example --> L6 L6 -- Example --> L7
Conceptual flow of data transposition between wide and long formats.
Basic Transposition: Columns to Rows
The most common use case for PROC TRANSPOSE
is converting data from a wide format to a long format. This is often necessary when your analysis requires all measurements of a certain type to be in a single variable, or when you need to group by those measurements.
The basic syntax involves specifying the input dataset, the output dataset, and the variables you want to transpose using the VAR
statement. If you have an identifier variable that should remain constant across the transposed rows, use the ID
statement. If you want to create a new variable that indicates the original column name, use the OUT
statement with the NAME=
option.
/* Sample Wide Data */
DATA WideSales;
INPUT Product $ Jan_Sales Feb_Sales Mar_Sales;
DATALINES;
A 100 120 110
B 200 210 230
C 50 60 55
;
RUN;
/* Transpose Wide to Long */
PROC TRANSPOSE DATA=WideSales OUT=LongSales (RENAME=(COL1=Sales));
BY Product; /* Keep 'Product' as an identifier */
VAR Jan_Sales Feb_Sales Mar_Sales; /* Variables to transpose */
RENAME _NAME_=Month; /* Rename the default _NAME_ variable to Month */
RUN;
PROC PRINT DATA=LongSales;
RUN;
Transposing a wide sales dataset into a long format.
BY
statement is crucial when you have identifier variables that define groups for which the transposition should occur independently. Without BY
, PROC TRANSPOSE
treats the entire dataset as one group.Advanced Transposition: Rows to Columns
While less common, PROC TRANSPOSE
can also convert data from a long format back to a wide format. This is useful when you need to create new variables based on distinct values of an existing variable, or when preparing data for specific reporting layouts.
To achieve this, you typically use the ID
statement to specify the variable whose values will become the new column names. The VAR
statement specifies the variable whose values will populate these new columns. The BY
statement is used to group observations that should form a single row in the output dataset.
/* Sample Long Data (from previous example) */
DATA LongSales;
INPUT Product $ Month $ Sales;
DATALINES;
A Jan 100
A Feb 120
A Mar 110
B Jan 200
B Feb 210
B Mar 230
C Jan 50
C Feb 60
C Mar 55
;
RUN;
/* Transpose Long to Wide */
PROC TRANSPOSE DATA=LongSales OUT=WideSales_Reversed;
BY Product; /* Group by Product to create one row per product */
ID Month; /* Values of 'Month' become new column names */
VAR Sales; /* Values of 'Sales' populate the new columns */
RUN;
PROC PRINT DATA=WideSales_Reversed;
RUN;
Transposing a long sales dataset back into a wide format.
BY
group, the combination of BY
and ID
variables uniquely identifies a single value in the VAR
variable. If not, PROC TRANSPOSE
will use the last encountered value, potentially leading to data loss or incorrect results.