Announcement Icon Online training class for Clinical R programming batch starts on Monday, 24Mar2025 17Mar2025. Click here for details.

Merging datasets - cartesian product


SAS code

data DUMMY01;
infile datalines dlm='|' dsd missover;
input grouplabel : $100. statistic : $100. group : best32. intord : best32.;
label ;
format ;
datalines4;
Sex, n(%)|Male|2|1
Sex, n(%)|Female|2|2
Sex, n(%)|Missing|2|99
;;;;
run;

data DUMMY_TREATMENTS;
infile datalines dlm='|' dsd missover;
input treatment : best32.;
label ;
format ;
datalines4;
1
2
3
4
;;;;
run;


proc sql;
    
create table dummy02 as
        
select *
        
from dummy01,dummy_treatments;
quit;

SAS code description

This SAS code snippet demonstrates how to perform a Cartesian product using the proc sql procedure. The resulting table, "dummy02," will have a number of rows equal to the product of the number of rows in "dummy01" and "dummy_treatments" tables.

The SQL query inside the proc sql block selects all columns (indicated by *) from the "dummy01" table and the "dummy_treatments" table. The Cartesian product of these two tables is created, resulting in a new table named "dummy02" that contains all possible combinations of rows from the two tables.

The quit statement is used to end the proc sql block and execute the query.

R code

library(tidyverse)

dummy01<-tribble(
~grouplabel,~statistic,~group,~intord,
"Sex, n(%)","Male",2,1,
"Sex, n(%)","Female",2,2,
"Sex, n(%)","Missing",2,99,
)

dummy_treatments<-tribble(
~treatment,
1,
2,
3,
4,
)

dummy02<-cross_join(dummy01,dummy_treatments)

R code description

This R Tidyverse code snippet demonstrates how to create a new data frame named "dummy02" by performing a cross join between two existing data frames, "dummy01" and "dummy_treatments," using the cross_join function from the dplyr package.

The cross_join function takes two data frames as arguments and returns a new data frame that combines every row from the first data frame with every row from the second data frame. The resulting "dummy02" data frame will contain all possible combinations of rows between "dummy01" and "dummy_treatments".