Merge/full join two datasets in SAS and R tidyverse


This post is part of 'SASnR | Merging/joining data' series

When working with data, we frequently encounter data such that different pieces of information of same entities are collected in separate datasets.

We need programming features to merge or join data horizontally from two or more datasets.


Let us assume that we have height and weight of a few students collected in two separate datasets as shown below.

Name

Height

Janet

62.5

Mary

66.5

Ronald

67

William

66.5

 

Name

Weight

Janet

112.5

Mary

112

Ronald

133

William

112

 

 


Let us assume that we want to merge or join the above two pieces of data such that the height and weight are present in a single dataset side by side as shown below.

Name

Height

Weight

Janet

62.5

112.5

Mary

66.5

112

Ronald

67

133

William

66.5

112

There are multiple ways of achieving this result in both SAS and R. Below is one basic approach in SAS and R.


SAS code

data HEIGHT;

infile datalines dlm='|' dsd missover;

input Name : $8. Height : best32.;

label ;

format ;

datalines;

Janet|62.5

Mary|66.5

Ronald|67

William|66.5

;

run;

 

data WEIGHT;

infile datalines dlm='|' dsd missover;

input Name : $8. Weight : best32.;

label ;

format ;

datalines;

Janet|112.5

Mary|112

Ronald|133

William|112

;

run;

 

proc sort data=height ;

   by name;

run;

 

proc sort data=weight ;

   by name;

run;

 

data both;

   merge height weight;

   by name;

run;

 
  • Sample data for this task is created using instream data infile/input/cards
  • height dataset contains the name and height of the students
  • weight dataset contains the name and weight of the students
  • height and weight of a student can be identified by the specific value in name variable. So, name is considered as a linking variable between these two datasets
  • To merge/join data horizontally from two or more datasets, we need to specify the names of the datasets on the merge statement separated by a space
  • We need to specify the name of the linking variables on the by statement

R tidyverse code

 library(tidyverse)
library(dplyr)
 
height<-tribble(
  ~Name,~Height,
  "Janet",62.5,
  "Mary",66.5,
  "Ronald",67,
  "William",66.5,
)
 
weight<-tribble(
  ~Name,~Weight,
  "Janet",112.5,
  "Mary",112,
  "Ronald",133,
  "William",112,
)
 
both<-full_join(height,weight,by="Name")  
  • Sample data for this task is created using tribble function of tidyverse
  • height and weight datasets are joined using full_join function of tidyverse
  • the names of the input datasets are specified as the first two arguments
  • the linking variables are specified on the by= parameter 
  • notice that the name of the linking variable is specified within quotes

 





Filter a category
AllSASnRSDTMSASDomainADaMStatisticsGeneral

List of other posts


Domain


General
What is MedDRA
What is WHO-DD
What is ATC classification of Drugs
Tables vs Listings in Clinical Trial Analysis
Analysis Sets in Clinical Trial Data Analysis
Validation of TFLs in Clinical Trials
Vital Signs in Clinical Trials
ECG in clinical trials
Solid tumors vs other cancers
What is 21 CFR Part 11?
The Drug development process
Overview of clinical trial process
What is a clinical trial?
What is Pinnacle 21?
Who conducts clinical trials?
Why are clinical trials conducted?
What are the phases of clinical trials?
What is a clinical trial protocol?
Case Report Form (CRF)
Database annotated CRF
What is a clinical trial registry?
Factors affecting drug metabolism and activity
Prior and Concomitant medications
Inclusion/Exclusion Criteria in a clinical trial
What happens after a clinical trial is completed?
The Investigational New Drug Process
Preclinical Research
Drug discovery
FDA Drug Review
What is the importance of baseline characteristics in a clinical trial?
Why do we need CDISC standards?
What is a clinical development plan?
What is a clinical study report?
"Exploratory study" vs "Confirmatory study"
What is ICH?
Clinical trial registries
Efficacy data vs Safety data
What is clinical data management?
Clinical SAS Programmer
Statistical Analysis Plan
What is RECIST 1.1?

Trial design aspects
What is a crossover clinical trial?
What is blinding in clinical trials?
What is an open-label clinical trial?
What is randomization in clinical trials?

Terminology
What is a cohort?

SAS


Definitions
What is a computer?
What happens when we execute a SAS program?
What is software?
What is SAS?
What is data?
What is data entry?
What is data retrieval?
What is data management?
What is "Report"?
What is statistics?
What is Statistical Analysis?
How do we use SAS?
What kind of questions can SAS help us answer?
How do we provide instructions to SAS?
What is a SAS program?
What does a SAS program contain?

General
Attributes of a SAS dataset
Rules for SAS dataset names
Rules for SAS variable names
Rules for SAS library names
Rules for character SAS format names
Reserved SAS dataset names
Rules for numeric SAS format names
What can SAS dataset options do?
Attributes of a SAS variable
Automatic conversion of data types in SAS
How does SAS expect our data to be organized?
Introduction to SAS interface
By groups in SAS

Informats
Rules for character SAS informat names

Proc freq
Count the number of times a particular value occurred in a variable of a dataset

Proc contents
Check the list of variables in a SAS dataset

Proc datasets
Delete all sas datasets from a library
Delete specific sas datasets from a library
Save specific sas datasets (and delete others) of a library
Rename SAS datasets using proc datasets change statement

Log issues
WARNING: No matching members in directory.

One-line definitions
What is a SAS library?
What is a libref?
What is an input statement?
What is infile statement?
What is set statement?
What is length statement?

SDTM


General
How to derive baseline flag in SDTM
How to create SEQ variable in SDTM
New domains in SDTM IG 3.3
What is a codelist?
How to derive study day variable in SDTM
What is SDTM?
Why do we need SDTM?
What is SDTM annotated Case Report Form (acrf.pdf)
How to convert original results to standard results using conversion factors

Demographics
What information does SDTM.DM (Demographics) contain?

Adverse Events
What is causality assessment?
What information does SDTM.AE (Adverse Events) contain?

Disposition
What information does SDTM.DS (Disposition) contain?

Concomitant Medications
What information does SDTM.CM (Concomitant/Prior Medications) contain?

Procedures
What information does SDTM.PR (Procedures) contain?

ADaM


General
What is ADaM?

ADSL
What is ADSL as per ADaM standard?

BDS
What is BDS as per ADaM standard?

Statistics


General
What is correlation?
Descriptive statistics
Inferential statistics
ANCOVA - Basic example
ANOVA - Basic example
What is Binomial test
What is chi-square test
What is 95% confidence interval
Measurements of central tendency
Measurements of spread
P-value
Alpha (Type I) error
Beta (Type II) error
Repeated measures analysis
ttest - introduction
Multiple imputation
What is Standard error
What is Fishers exact test
What is Least Squares Means

SASnR


Introduction
What is R?
What is an R package?
What is tidyverse?
What are the core packages of tidyverse?
What is haven package of tidyverse?
How to install tidyverse?
How to load core tidyverse packages into the R session?

Reading data
Import/Read SAS dataset into R

Creating sample data
How to create some sample data in SAS and R tidyverse

Subset variables (columns)
How to select only required variables/columns in SAS and R tidyverse?
How to drop unwanted variables/columns in SAS and R tidyverse?

Subset observations (rows)
How to select/subset required rows in SAS and R tidyverse

Appending data
Append two datasets in SAS and R tidyverse

Merging/joining data
Merge/full join two datasets in SAS and R tidyverse
Merge/inner join two datasets in SAS and R tidyverse
Merge/left join two datasets in SAS and R tidyverse

Sort (order) observations
Sort/order observations based on the values in a single variable in SAS and R tidyverse

Transpose/Restructure data
Restructure/transpose long data to wide data
Restructure/transpose wide data to long data

Obtain frequencies
Obtain frequencies/counts based on one variable - one-way frequencies in SAS and R tidyverse
Obtain frequencies/counts based on two variables - two-way frequencies in SAS and R tidyverse

Descriptive statistics
Descriptive statistics for a numeric variable using SAS and R tidyverse

General


General
You are good! But... You can be much better!!
A little motivational post!