PROC TRANSPOSE - Part 02: PREFIX=, Character Variables, and Reversing a Transpose
Overview
Part 01 covered the core mechanics of PROC TRANSPOSE: reshaping data between wide and long formats using BY, ID, VAR, and NAME= options.
This lesson covers three additional practical topics: controlling the names of transposed output columns using PREFIX=, transposing character (text) variables, and reversing a transpose to recover the original structure.
Controlling Output Column Names with PREFIX=
When you use an ID statement, the values of the ID variable become the names of the output columns. If those values are numeric or begin with a digit, SAS cannot use them directly as variable names.
The PREFIX= option adds a text prefix to each column name generated by the ID statement, ensuring valid SAS variable names even when ID values are numeric.
PREFIX= is also useful when you want output columns named consistently — for example, visit1, visit2, visit3 instead of 1, 2, 3.
SAS Log
Without PREFIX=, the ID values 1, 2, 3 would generate column names col1, col2, col3 (SAS applies a default prefix when ID values are not valid names on their own).
With PREFIX=visit, the output columns are named visit1, visit2, visit3 — clear, self-documenting, and immediately recognisable.
The DROP=_name_ removes the automatic _NAME_ column that PROC TRANSPOSE adds (it records which source variable was transposed — not needed here since we have only one VAR variable).
Check WIDE_PREFIXED — it should have columns subject, visit1, visit2, visit3 with one row per subject.
Dataset View
Transposing Character Variables
PROC TRANSPOSE works with character variables as well as numeric ones.
When the VAR variable is character, the output columns are also character. SAS sets the length of the output character columns to the longest value found across all observations being transposed.
The _NAME_ variable in the output records the name of the source variable that was transposed — when transposing a single character VAR, this is the same for every row, so it is commonly dropped.
Transposing character variables is useful for reshaping coded text values, categories, or labels from long to wide format.
SAS Log
Each AE term (character) becomes a separate column — ae1, ae2, ae3 — with one row per subject.
Subject 002 only has two AEs, so ae3 will be missing (.) for that row — PROC TRANSPOSE fills with missing when the ID value does not exist for a given BY group.
Check AE_WIDE to confirm the structure and note the missing value for subject 002's ae3 column.
This pattern is commonly used to display one row per subject with all events spread across columns — typical in data listings and summary tables.
Dataset View
Reversing a Transpose - Wide Back to Long
A reverse transpose takes wide-format data (many columns per row) and converts it back to long format (one row per column value).
This is the exact inverse of the wide-from-long operation shown above.
You use PROC TRANSPOSE without an ID statement (since there are no ID values to assign to the long rows — PROC TRANSPOSE generates its own _NAME_ variable instead), and you list the columns to collapse in the VAR statement.
After transposing, _NAME_ contains the original column name (e.g., visit1, visit2), and COL1 (or the name you assign via a rename) contains the value.
SAS Log
NAME=visit_label renames the automatic _NAME_ column to something meaningful — it will contain the strings visit1, visit2, visit3.
The output column for the values is named COL1 by default. The DATA step renames it to result for clarity.
SUBSTR(visit_label, 6) extracts the numeric portion from the column name (everything after the "visit" prefix), and INPUT(..., best.) converts it to a numeric visit number.
Check LONG_CLEAN — it should be identical in structure to LONG_DATA: one row per subject per visit with subject, visit, and result.
Compare observation counts: LONG_DATA has 6 rows; LONG_CLEAN should also have 6 rows. If WIDE_PREFIXED had missing visit values, the reverse transpose will include those missing-value rows — filter them out with WHERE result is not missing if needed.
Dataset View
Using LABEL= to Annotate the _NAME_ Column
When the output of a reverse transpose is used in a report, the _NAME_ column (or its renamed equivalent) often needs a meaningful label for display purposes.
PROC TRANSPOSE supports a LABEL= option that specifies the label to attach to the _NAME_ output variable, making printed output easier to read.
SAS Log
LABEL=visit_label_text assigns that string as the label of the visit_col variable in the output dataset.
This label is displayed in PROC PRINT output when you use the LABEL option, and in PROC REPORT column headings.
For most data manipulation purposes, LABEL= is cosmetic, but it is good practice to label variables for downstream reporting.
Key Points
PREFIX= prepends a text string to all column names generated by the ID statement — essential when ID values are numeric and therefore not valid SAS variable names on their own.
Character variables transpose the same way as numeric ones — output columns inherit a character type and a length sufficient for the longest value.
A reverse transpose (wide to long) uses VAR to list the columns to collapse and NAME= to capture the column names; the values land in COL1 by default and should be renamed for clarity.
Missing values in wide columns produce missing rows in the long output — filter with WHERE if you want to exclude them.
LABEL= on the PROC TRANSPOSE statement labels the _NAME_ output column, useful for polished reporting output.