When working with data, we frequently encounter data such that different pieces of information of same entities are collected in separate datasets.
We need programming features to merge or join data horizontally from two or more datasets.
Let us assume that we have height and weight of a few students collected in two separate datasets as shown below.
Name |
Height |
Janet |
62.5 |
Mary |
66.5 |
Ronald |
67 |
William |
66.5 |
Name |
Weight |
Janet |
112.5 |
Mary |
112 |
Ronald |
133 |
William |
112 |
Let us assume that we want to merge or join the above two pieces of data such that the height and weight are present in a single dataset side by side as shown below.
Name |
Height |
Weight |
Janet |
62.5 |
112.5 |
Mary |
66.5 |
112 |
Ronald |
67 |
133 |
William |
66.5 |
112 |
There are multiple ways of achieving this result in both SAS and R. Below is one basic approach in SAS and R.
data HEIGHT;
infile datalines dlm='|' dsd missover;
input Name : $8. Height : best32.;
label ;
format ;
datalines;
Janet|62.5
Mary|66.5
Ronald|67
William|66.5
;
run;
data WEIGHT;
infile datalines dlm='|' dsd missover;
input Name : $8. Weight : best32.;
label ;
format ;
datalines;
Janet|112.5
Mary|112
Ronald|133
William|112
;
run;
proc sort data=height ;
by name;
run;
proc sort data=weight ;
by name;
run;
data both;
merge height weight;
by name;
run;
library(tidyverse)
library(dplyr)
height<-tribble(
~Name,~Height,
"Janet",62.5,
"Mary",66.5,
"Ronald",67,
"William",66.5,
)
weight<-tribble(
~Name,~Weight,
"Janet",112.5,
"Mary",112,
"Ronald",133,
"William",112,
)
both<-full_join(height,weight,by="Name")