Common scenarios where merging (joining rows into a single observation) is required
Presenting height and weight of students on a single observation when height and weight is stored in two different datasets
Presenting height, weight and sex of students on a single observation when height, weight and sex is stored in three different datasets
Presenting treatment start date of a subject along side of adverse event start date to check if the event started after treatment start date
What does 'MERGE statement' of SAS data step do?
By definition, merge statement is used for joining rows from two or more datasets into a single row
What is 'match merging' in SAS?
Combining rows from two or more datasets into a single observation based on the values of one or more common variables is called match merging
When performing a match merge, we need to specify the common variables, based on which we need to merge the ovservations, on the by statement
Creating some sample datasets
One to One Match Merging
Create a dataset named heightweight by merging height and weight datasets based on the values present in name variable
When perorming a match merge, we need to sort the input datasets based on the common variables
We need to specify the name of the output dataset on the data statement
We need to specify the names of the input datasets, separated by a space, on the merge statement
As we are performing a match merge, we need to specify the common variables based on whose values we
want to join the observations on the by statement. Here in this example, as we need to join the height and weight information of each student
we need to specify the variable holding the names of the students as by variable
Create a dataset named hws by merging height,weight and sex datasets based on the values present in name variable
Presort the input datasets based on the common variables using proc sort
Peform the merge of 3 input datasets based on the common variable, name
Create a dataset named ahws by merging age,height,weight and sex datasets based on the values present in name variable
Presort the input datasets based on the common variables using proc sort
Peform the merge of 4 input datasets based on the common variable, name