Announcement Icon Online training class for Clinical SAS programming starting soon. Click here for details.

Merging datasets- select unmatched records


SAS code

data SEX;
infile datalines dlm='|' dsd missover;
input Name : $8. Sex : $1.;
label ;
format ;
datalines4;
Alfred|M
Henry|M
Mary|F
;;;;
run;

data AHW;
infile datalines dlm='|' dsd missover;
input Name : $8. Age : best32. Height : best32. Weight : best32.;
label ;
format ;
datalines4;
Alfred|14|69|112.5
Henry|14|63.5|102.5
James|12|57.3|83
;;;;
run;


proc sort data=sex;
    by name;
run;

proc sort data=ahw;
    by name;
run;

data unmatched;
    merge sex(in=a) ahw(in=b keep=name);
    by name;
    if a and not b;
run;


data unmatched;
    merge sex(in=a keep=name) ahw(in=b);
    by name;
    if b and not a;
run;

SAS code description

These SAS code snippets demonstrate the process of sorting datasets and performing merges to identify unmatched observations based on the "name" variable. By comparing the sorted datasets and using conditional statements, the code extracts observations that are present in one dataset but do not have a corresponding match in the other dataset.

These SAS code snippets involve sorting and merging datasets to identify unmatched observations based on the "name" variable.

Sorting the Data: In both snippets, the datasets "sex" and "ahw" are sorted separately using the proc sort procedure. Sorting is performed based on the "name" variable to ensure that the datasets are in the same order before merging.

Finding Unmatched Observations (sex): In the first snippet, the "unmatched" dataset is created by merging the sorted "sex" and "ahw" datasets using the merge statement and the by statement with the "name" variable. The in=a option is used with the "sex" dataset to indicate the presence of a matching observation. The if a and not b condition keeps only the observations from the "sex" dataset that do not have a match in the "ahw" dataset. The resulting "unmatched" dataset includes unmatched observations from the "sex" dataset, retaining only the "name" variable.

Finding Unmatched Observations (ahw): In the second snippet, the "unmatched" dataset is created by merging the sorted "sex" and "ahw" datasets using the merge statement and the by statement with the "name" variable. The in=b option is used with the "ahw" dataset to indicate the presence of a matching observation. The if b and not a condition keeps only the observations from the "ahw" dataset that do not have a match in the "sex" dataset. The resulting "unmatched" dataset includes unmatched observations from the "ahw" dataset, retaining only the "name" variable.

R code


library(tidyverse)

sex<-tribble(
~name,~sex,
"Alfred","M",
"Henry","M",
"Mary","F",
)

ahw<-tribble(
~name,~age,~height,~weight,
"Alfred",14,69,112.5,
"Henry",14,63.5,102.5,
"James",12,57.3,83,
)

unmatched<-anti_join(sex,ahw,by=c("name"))


unmatched<-anti_join(ahw,sex,by=c("name"))

R code description

These R Tidyverse code snippets demonstrate how to identify unmatched observations between two data frames, "sex" and "ahw," using the anti_join function from the dplyr package.

Finding Unmatched Observations (sex): The first code snippet uses the anti_join function to find unmatched observations in the "sex" data frame compared to the "ahw" data frame. The by argument specifies the variable(s) to use for matching, in this case, the "name" variable. The resulting "unmatched" data frame includes observations from the "sex" data frame that do not have a matching "name" in the "ahw" data frame.

Finding Unmatched Observations (ahw): The second code snippet reverses the order of the data frames in the anti_join function. It identifies unmatched observations in the "ahw" data frame compared to the "sex" data frame. Again, the by argument specifies the "name" variable for matching. The resulting "unmatched" data frame includes observations from the "ahw" data frame that do not have a matching "name" in the "sex" data frame.