Introduction to SAS - Part 06: Variable Labels, Dataset Labels, and Display Formats
Overview
SAS stores variable names in a compact, code-friendly form — typically short uppercase names such as SUBJ, TRTP, or AVAL.
These short names are efficient for programming but are not always easy to read in reports and output.
SAS allows you to attach a label to any variable or dataset. The label is a longer, descriptive piece of text that SAS displays in output instead of the raw name.
Similarly, a format tells SAS how to display the value of a variable — for example showing 1 as Male and 2 as Female, or displaying a date number as 01JAN2024.
Labels and formats are stored as metadata inside the dataset. They travel with the data and are applied automatically whenever the dataset is used in a PROC step that supports them.
This lesson covers how to assign variable labels, dataset labels, and display formats, and how to confirm they have been stored correctly.
Variable Labels
A variable label is a text string of up to 256 characters that describes what a variable contains.
Labels are assigned using the LABEL statement inside a DATA step or inside PROC steps that support it such as PROC DATASETS.
Once assigned, the label is stored permanently in the dataset descriptor and will appear in PROC PRINT, PROC MEANS, PROC FREQ, PROC REPORT, and other output-producing procedures.
The variable name itself does not change — the label is purely a display attribute.
SAS Log
The LABEL statement lists each variable name followed by an equals sign and the label text in quotes.
Multiple variables can be labelled in a single LABEL statement — each assignment is separated by a space or new line.
Labels are saved into the dataset as part of its descriptor. You do not need to re-assign them every time you use the dataset.
Dataset View
Confirming Labels with PROC CONTENTS OUT=
PROC CONTENTS describes the structure of a dataset — variable names, types, lengths, formats, and labels.
The OUT= option saves that descriptor information to a SAS dataset instead of only printing it to the output tab.
Saving to a dataset lets us inspect the label and format columns directly in a data view, which is more convenient than reading the output tab.
Key columns in the OUT= dataset: NAME (variable name), LABEL (assigned label text), FORMAT (assigned format), TYPE (1=numeric, 2=character), LENGTH.
SAS Log
NOPRINT tells SAS not to write to the output tab — the descriptor information goes only to the OUT= dataset.
Inspect CONTENTS_SUBJECTS in the data view below and locate the LABEL column — confirm that each variable shows the label text assigned in the DATA step.
The NAME column holds the variable name and the LABEL column holds the descriptive text. A blank LABEL means no label has been assigned for that variable.
Dataset View
Dataset Labels
As well as labelling individual variables, you can attach a label to the dataset itself.
A dataset label is assigned using the LABEL= dataset option in the DATA statement.
The dataset label is stored in the descriptor and is visible in PROC DATASETS library listings, making it easy to identify a file's purpose when browsing a library with many datasets.
SAS Log
The LABEL= option appears in parentheses immediately after the dataset name — this is a dataset option, distinct from the variable LABEL statement.
In the CONTENTS_SUBJECTS2 data view below look for the MEMLABEL column — this holds the dataset-level label text.
Variable-level labels are in the LABEL column as before; the dataset label is stored separately in MEMLABEL.
Dataset labels are especially useful in permanent libraries where many datasets are stored and readers need to understand each file's purpose at a glance.
Dataset View
Assigning Labels with PROC DATASETS
You do not always need to re-run a DATA step to add or change labels. PROC DATASETS can modify the descriptor of an existing dataset without reading or rewriting the data.
This is much faster for large datasets because no observations are processed — only the metadata is updated.
Use the MODIFY statement followed by a LABEL statement inside PROC DATASETS to assign or change variable labels.
SAS Log
MODIFY subjects tells PROC DATASETS which dataset to update — no observations are read or written, only the descriptor changes.
The LABEL statement inside PROC DATASETS works exactly as in a DATA step — variable name equals label text.
Inspect CONTENTS_SUBJECTS3 below and confirm that the LABEL column for AGE now reads Age at Baseline (years) rather than the original text.
This approach is the preferred way to correct a label on a large permanent dataset without the overhead of a full DATA step pass.
Dataset View
Display Formats
A format tells SAS how to display a variable's value in output. The stored value is unchanged — only the displayed representation changes.
SAS has many built-in formats. Common examples include DATE9. for displaying date values as 01JAN2024, DOLLAR12.2 for currency, and $UPCASE. for uppercase character display.
Formats are assigned using the FORMAT statement inside a DATA step, or via PROC DATASETS without rewriting data.
Once assigned, the format name is stored in the dataset descriptor. We can confirm it using PROC CONTENTS OUT= and inspecting the FORMAT column.
SAS Log
The FORMAT statement assigns the DATE9. format to VISITDT. Without this format the date would display as a raw integer — the number of days since 01JAN1960.
The INFORMAT :DATE9. in the INPUT statement tells SAS how to read the raw text into a date value. The FORMAT statement controls how that value is displayed.
Inspect CONTENTS_VISITS below and locate the FORMAT column — it should show DATE9 next to VISITDT, confirming the format was stored in the descriptor.
Inspect the VISITS data view — the VISITDT column will display as a formatted date because the format is attached to the variable.
Dataset View
Assigning Formats with PROC DATASETS
Just like labels, formats can be added or changed on an existing dataset using PROC DATASETS without rewriting the data.
Use the FORMAT statement inside a MODIFY block to assign a format. To remove a format write FORMAT varname; with no format name — SAS clears the stored format.
SAS Log
The FORMAT statement inside MODIFY works the same way as in a DATA step — variable name followed by the format name.
Inspect CONTENTS_SUBJECTS4 below and confirm the FORMAT column for SEX now shows $UPCASE, confirming the format was attached without a DATA step rewrite.
To remove a format entirely use FORMAT sex; with no format name in the MODIFY block. The FORMAT column will be blank in PROC CONTENTS after that.
Dataset View
The ATTRIB Statement
The ATTRIB statement is a convenient way to assign a label, format, informat, and length for a variable all in one place.
Using ATTRIB keeps variable attribute definitions together and avoids having separate LABEL, FORMAT, and LENGTH statements scattered through the DATA step.
Each variable gets its own ATTRIB clause and any attributes not specified are left unchanged or take their default values.
SAS Log
Each ATTRIB clause names the variable followed by one or more attribute keywords: LENGTH=, LABEL=, FORMAT=, INFORMAT=.
All attributes for a variable are grouped on one line making the data structure easy to read and maintain.
The ATTRIB statement is especially common in clinical programming where datasets have many variables each requiring a label, length, and sometimes a format.
Inspect CONTENTS_SUMMARY below — the LABEL column should show the assigned text and the FORMAT column should show 8.1 for AVAL. All other variables should have blank FORMAT since none was specified.
Dataset View
Key Points
Variable labels are descriptive text attached to a variable using the LABEL statement. They are stored in the dataset descriptor and appear as column headings in output procedures.
Dataset labels are assigned using the LABEL= dataset option and identify the dataset's purpose in library listings.
Formats control how values are displayed — the stored value is never changed, only the displayed representation.
PROC CONTENTS with OUT= saves the dataset descriptor to a dataset, allowing the NAME, LABEL, FORMAT, TYPE, and LENGTH columns to be inspected directly without relying on the output tab.
Both labels and formats can be assigned in a DATA step or updated on an existing dataset using PROC DATASETS without rewriting any observations.
The ATTRIB statement assigns label, format, informat, and length for a variable in a single clause — useful for keeping variable definitions organised in one place.