Restructure/transpose long data to wide data


This post is part of 'SASnR | Transpose/Restructure data' series

When working with data, we frequently want to restructure the "long" data to "wide" data.

We need programming features to handle this.

Let us assume that we have some lab data collected in the structure of "one record per subject per lab test" as shown below.


usubjid

lbtestcd

lbstresn

1001

HGB

13

1001

ALT

30

1001

AST

23

1002

HGB

12

1002

ALT

28

1002

AST

15

 

And, let us say that we want to restructure this data such that there is "one record per subject" and the the test results of each lab test are to be presented as separate columns as shown below.


usubjid

HGB

ALT

AST

1001

13

30

23

1002

12

28

15


There are multiple ways of obtaining this result in both SAS and R. Below is one such approach.

SAS code


data long;
    infile cards dlm='|' truncover;
    input usubjid$ lbtestcd$ lbstresn;
    format lbstresn best6.;
cards;
1001|HGB|13
1001|ALT|30
1001|AST|23
1002|HGB|12
1002|ALT|28
1002|AST|15
;
run;
 
proc sort data=long ;
   by usubjid;
run;
 
proc transpose data=long   out=wide  ;
   by usubjid;
   var lbstresn;
   id lbtestcd;
run;

  • Sample data is created using instream data ("infile/input/cards") approach
  • We can restructure the data using proc transpose
  • The variables whose values need to remain as rows should be specified on by statement
  • The variables whose values need to transposed should be specified on the var statement
  • The variables whose values should become the names of the newly created columns should be specified on the id statement 

R tidyverse code


 

#Load required libraries
library(tidyverse)
library(haven)
 
long<-tribble(
  ~usubjid,~lbtestcd,~lbstresn,
  "1001","HGB",13,
  "1001","ALT",30,
  "1001","AST",23,
  "1002","HGB",12,
  "1002","ALT",28,
  "1002","AST",15,
)
 
wide<-pivot_wider(long,id_cols=usubjid,values_from=lbstresn,names_from = lbtestcd)

 

  • tribble function is used to create sample input data
  • pivot_wider function can be used to restructure the long data to wide data in R tidyverse
  • The variables whose values need to remain as rows have to be specified on id_cols parameter
  • The variables whose values need to be transposed have to be specified on values_from parameter
  • The variables whose values need to become the names of the newly created columns have to be specified on the names_from parameter




Post categories
SASnR
SDTM