Announcement Icon Online training class for Clinical SAS programming starting soon. Click here for details.

Merging datasets


SAS code

data SEX;
infile datalines dlm='|' dsd missover;
input Name : $8. Sex : $1.;
label ;
format ;
datalines4;
Alfred|M
Henry|M
Mary|F
;;;;
run;

data AHW;
infile datalines dlm='|' dsd missover;
input Name : $8. Age : best32. Height : best32. Weight : best32.;
label ;
format ;
datalines4;
Alfred|14|69|112.5
Henry|14|63.5|102.5
James|12|57.3|83
;;;;
run;

*------------------------------------------------------------------------------;
*full join;
*------------------------------------------------------------------------------;

data full;
    merge sex ahw;
    by name;
run;


*------------------------------------------------------------------------------;
*left join;
*------------------------------------------------------------------------------;

data left;
    merge sex(in=a) ahw;
    by name;
    if a ;
run;


*------------------------------------------------------------------------------;
*right join;
*------------------------------------------------------------------------------;

data right;
    merge sex ahw(in=b);
    by name;
    if b ;
run;

*------------------------------------------------------------------------------;
*inner join;
*------------------------------------------------------------------------------;

data inner;
    merge sex(in=a) ahw(in=b);
    by name;
    if a and b;
run;

SAS code description

These SAS code snippets demonstrate different types of joins performed on the "sex" and "ahw" datasets. Each join combines the datasets based on a common variable, "name," and creates a new dataset with the merged data.

Full Join: The "full" dataset is created by merging the "sex" and "ahw" datasets using the merge statement and the by statement with the "name" variable. This results in a full join, where all observations from both datasets are included in the merged dataset.

Left Join: The "left" dataset is created by merging the "sex" dataset (specified as "sex(in=a)") with the "ahw" dataset using the merge statement and the by statement with the "name" variable. The if a statement filters the resulting merged dataset to include only the observations from the left dataset, ensuring a left join.

Right Join: The "right" dataset is created by merging the "sex" dataset with the "ahw" dataset (specified as "ahw(in=b)") using the merge statement and the by statement with the "name" variable. The if b statement filters the resulting merged dataset to include only the observations from the right dataset, ensuring a right join.

Inner Join: The "inner" dataset is created by merging the "sex" dataset (specified as "sex(in=a)") with the "ahw" dataset (specified as "ahw(in=b)") using the merge statement and the by statement with the "name" variable. The if a and b statement filters the resulting merged dataset to include only the observations present in both datasets, resulting in an inner join.

R code


library(tidyverse)

sex<-tribble(
~Name,~Sex,
"Alfred","M",
"Henry","M",
"Mary","F",
)

ahw<-tribble(
~Name,~Age,~Height,~Weight,
"Alfred",14,69,112.5,
"Henry",14,63.5,102.5,
"James",12,57.3,83,
)

full<-full_join(sex,ahw,by="Name")

left<-left_join(sex,ahw,by="Name")

right<-right_join(sex,ahw,by="Name")

inner<-inner_join(sex,ahw,by="Name")

R code description

These R Tidyverse code snippets demonstrate different types of joins performed on the "sex" and "ahw" data frames using the dplyr package functions. Each join combines the data frames based on a common variable, "Name," and creates a new data frame with the merged data.

Full Join: The full_join function is used to perform a full join on the "sex" and "ahw" data frames based on the "Name" variable. The resulting data frame, "full," contains all observations from both data frames, matching observations where possible and filling in missing values with NA.

Left Join: The left_join function is used to perform a left join on the "sex" and "ahw" data frames based on the "Name" variable. The resulting data frame, "left," includes all observations from the "sex" data frame and matching observations from the "ahw" data frame. Observations that do not have a match in the "ahw" data frame will have missing values for the "ahw" variables.

Right Join: The right_join function is used to perform a right join on the "sex" and "ahw" data frames based on the "Name" variable. The resulting data frame, "right," includes all observations from the "ahw" data frame and matching observations from the "sex" data frame. Observations that do not have a match in the "sex" data frame will have missing values for the "sex" variables.

Inner Join: The inner_join function is used to perform an inner join on the "sex" and "ahw" data frames based on the "Name" variable. The resulting data frame, "inner," includes only the observations that have a matching "Name" in both data frames.