This lesson covers behaviour of the SET statement when appending datasets that have different variables or different variable attributes
For an introduction to appending using the SET statement with same-attribute datasets, refer to lesson SAS_APPENDING_L101
What happens when appending datasets with different variables?
When the input datasets on the SET statement have different variables, SAS includes all variables from all datasets in the output dataset
For observations from a dataset that does not contain a particular variable, SAS assigns a missing value for that variable
This behaviour is governed by the Program Data Vector (PDV) which is initialised with all variables found across all input datasets
Appending datasets with different variables
Create example datasets with different variables
We create two datasets from sashelp.class, each retaining a different subset of variables
class_age keeps NAME and AGE only, while class_sex keeps NAME and SEX only
Both datasets share the NAME variable but differ in their remaining variable — this is the scenario we want to demonstrate for appending datasets with different variables
SAS Log
class_age contains NAME and AGE only
class_sex contains NAME and SEX only
When appended, the output dataset will contain NAME, AGE, and SEX
Observations from class_age will have missing values for SEX
Observations from class_sex will have missing values for AGE
SAS Log
Dataset View
Variable length differences when appending using SET
When the same variable exists in multiple input datasets but with different lengths, SAS uses the length defined in the PDV
The PDV takes the length from the first dataset in the SET statement that defines that variable
If a subsequent dataset has a longer length for the same variable, values may be truncated to the length established by the first dataset
SAS will issue a warning in the log when truncation occurs
Appending when the first dataset has a shorter variable length
SAS Log
short_name has NAME with length 5
long_name has NAME with default length 8
Since short_name is listed first on the SET statement, the PDV sets NAME length to 5
Observations from long_name will have NAME values truncated to 5 characters
SAS Log
Dataset View
Appending when the first dataset has a longer variable length
When the first dataset has the longer length, no truncation occurs because the PDV length is sufficient for all values
SAS Log
Dataset View
Controlling variable length using a LENGTH statement before SET
The recommended approach to avoid truncation is to explicitly define variable lengths using a LENGTH statement before the SET statement
The LENGTH statement establishes the variable in the PDV with the specified length before SAS reads any input dataset
This overrides the length that would otherwise be taken from the first dataset
SAS Log
Dataset View
Variable type conflicts when appending using SET
When the same variable exists in multiple input datasets but with different types (numeric vs character), SAS produces an error and stops processing
Unlike length differences, type conflicts cannot be resolved with a LENGTH statement
The variable must be renamed or converted in one of the input datasets before appending
Resolving a type conflict using RENAME and conversion
Creating two datasets where AGE is numeric in one and character in the other
SAS Log
To resolve the conflict, convert the character AGE back to numeric before appending