This lesson covers the IN= dataset option which is commonly used alongside the SET statement during appending
For an introduction to appending concepts and the SET statement, refer to lesson SAS_APPENDING_L101
What is the IN= dataset option?
The IN= dataset option creates a temporary numeric variable that indicates whether the current observation originated from that particular dataset
The variable is set to 1 when the current observation comes from the associated dataset and 0 otherwise
It is specified in parentheses after the dataset name on the SET statement
The IN= variable exists only within the data step - it is not automatically written to the output dataset
It must be used within the data step to derive permanent variables or to filter observations
Syntax of IN= dataset option
SAS Log
Create some example datasets
Recreating the same example datasets used in SAS_APPENDING_L101 for consistency
SAS Log
Using IN= to identify the source dataset of each observation
A common use of IN= is to create a variable that permanently records which input dataset each observation came from
This is useful when you need to distinguish the source of observations in the combined dataset
Create a source flag when combining males and females datasets
in_males will be 1 for all observations from the males dataset and 0 for observations from females
in_females will be 1 for all observations from the females dataset and 0 for observations from males
We use these temporary variables to create a permanent SOURCE variable in the output dataset
SAS Log
Dataset View
Create a source flag when combining sedan, sports, and othercars datasets
We apply the same IN= technique to three datasets this time, using single-letter indicator variables a, b, and c for brevity
An IF / ELSE IF chain assigns the SOURCE value based on which indicator is set to 1 for the current observation
Only one of a, b, or c will equal 1 at any time since each observation comes from exactly one input dataset, so ELSE IF correctly routes each observation to a single source label
Inspect allcars in the data view and confirm that every observation has a non-missing SOURCE value and that the observation counts per source match the counts in sedan, sports, and othercars respectively
SAS Log
Dataset View
Using IN= to filter observations after appending
IN= can also be used to selectively keep or exclude observations from specific input datasets
This allows you to combine datasets and apply dataset-specific conditions in a single data step
Combine preteen and teen datasets but keep only observations from teen
Both datasets are read by the SET statement but only teen observations are written to the output
This is equivalent to simply reading teen alone but demonstrates how IN= can filter during a multi-dataset SET
SAS Log
Dataset View
Combine males and females but keep only males aged 13 or older
IN= conditions can be combined with other conditions using AND to apply dataset-specific filters
SAS Log
Dataset View
Clinical SAS application of IN= dataset option
A very common clinical SAS scenario is combining prior medications and concomitant medications into a single dataset while retaining information about the source
The IN= option makes it straightforward to flag the source of each record
Simulating a prior medications and concomitant medications combine