Writing to multiple output datasets in a single DATA step
Why write to multiple datasets in one pass
The previous DATA step lesson covered explicit OUTPUT statements, which control when an observation is written to the output dataset
SAS DATA steps can name more than one output dataset in the DATA statement and use conditional OUTPUT statements to route each observation to the appropriate destination
This allows a single pass through the input data to produce several separate output datasets based on conditions
Writing to multiple outputs in one step is more efficient than reading the same data multiple times with separate subsetting steps
It also ensures that each observation is classified and routed consistently from one pass of the data
Create sample data
We create a dataset of clinical adverse events with varying severity grades
The goal is to route each event to a severity-specific dataset in one DATA step
SAS Log
Confirm that `adverse_events` has 10 rows with three severity levels: Mild (4), Moderate (3), and Severe (3)
Dataset View
Naming multiple output datasets in the DATA statement
List all output dataset names on the DATA statement
To write to multiple datasets, list them all on the DATA statement separated by spaces
When multiple datasets are listed, the default implicit output at the end of the DATA step writes to ALL named datasets unless you include explicit OUTPUT statements
To route observations selectively, you must use explicit OUTPUT statements that name the specific target dataset
Once you use any explicit OUTPUT statement in a DATA step, the implicit output is suppressed — you must explicitly OUTPUT every observation you want written
SAS Log
Three output datasets are created in one pass of `adverse_events`
Each observation goes to exactly one output dataset based on its severity value
Verify: `ae_mild` should have 4 rows, `ae_moderate` 3 rows, `ae_severe` 3 rows
Because explicit OUTPUT statements are used, no observation is written implicitly — every row must be handled by an IF-THEN branch
Dataset View
Route observations to more than one output dataset simultaneously
An observation can be written to more than one dataset by including multiple OUTPUT statements for it
In this example, we create a dataset of all events, plus a separate dataset containing only the events from subjects who had at least one severe event
An observation for a subject with a severe event appears in both `ae_all` and `ae_severe_subjects`
SAS Log
`ae_all` receives every row — all 10 adverse events
`ae_severe_subjects` receives every event from subjects 002, 003, and 005 because those subjects had at least one severe event — even their mild and moderate events are included
This pattern is useful when you need a complete profile for a subject based on them meeting a specific criterion in at least one observation
Dataset View
Split data into train and test sets by row number
Another practical use of multiple output datasets is splitting a dataset into two portions based on a row counter
This is a common requirement in model building, where you need a training set and a validation set
We use the automatic `_N_` variable, which holds the current iteration number of the DATA step
SAS Log
The `nobs=total` dataset option in the SET statement populates `total` with the number of observations before the DATA step begins reading
`_N_` is 1 for the first iteration, 2 for the second, and so on
Observations 1 through 5 go to `ae_first_half` and observations 6 through 10 go to `ae_second_half`
Dataset View
Combine multiple output routing with additional variable creation
Multiple-output DATA steps can also create or modify variables just as any other DATA step can
In this example, we add a numeric severity score and a flag variable before routing each observation
All output datasets receive the newly created variables because they are assigned before the OUTPUT statements
SAS Log
Both `ae_low` and `ae_high` contain the `severity_score` and `high_flag` variables created during the step
`ae_low` has 4 rows (grade 1 events) and `ae_high` has 6 rows (grade 2 and 3 events)
Confirm that `high_flag` is 0 in `ae_low` and 1 in `ae_high`
Dataset View
Key points to remember
List multiple dataset names on the DATA statement to create more than one output dataset in a single step
Once you use any explicit OUTPUT statement in a DATA step, SAS suppresses the implicit end-of-loop output — every observation you want written must have an explicit OUTPUT call
An observation can be written to multiple datasets by including more than one OUTPUT statement for it
The `nobs=` dataset option on a SET statement provides the total observation count before the loop starts — use it with `_N_` for row-position-based routing
Variables created or modified before the OUTPUT statements are included in all target datasets that receive that observation
One-pass multiple-output DATA steps are more efficient than reading the same source dataset multiple times in separate subsetting steps