Restructure/transpose wide data to long data


This post is part of 'SASnR | Transpose/Restructure data' series

When working with data, we frequently want to restructure the "wide" data to "long" data.

We need programming features to handle this.

Let us assume that we have some lab data collected in the structure of "one record per subject" and the the test results of each lab test are to be presented as separate columns as shown below. .


usubjid

HGB

ALT

AST

1001

13

30

23

1002

12

28

15

And, let us say that we want to restructure this data such that there is "one record per subject per lab test" as shown below.


usubjid

lbtestcd

lbstresn

1001

HGB

13

1001

ALT

30

1001

AST

23

1002

HGB

12

1002

ALT

28

1002

AST

15


There are multiple ways of obtaining this result in both SAS and R. Below is one such approach.

SAS code


data WIDE;
infile datalines dlm='|' dsd missover;
input usubjid : $4. HGB : BEST6. ALT : BEST6. AST : BEST6.;
format ;
datalines;
1001|13|30|23
1002|12|28|15
;
run;
 
proc sort data=wide;
    by usubjid;
run;
 
proc transpose data=wide   out=long(rename=(col1=lbstresn)) name=lbtestcd ;
   by usubjid;
   var HGB ALT AST;
run;
  • Sample data is created using instream data ("infile/input/cards") approach
  • We can restructure the data using proc transpose
  • The variables whose values need to remain as rows should be specified on by statement
  • The variables whose values need to transposed should be specified on the var statement
  • When the data is transposed, the names of the columns being transposed are stored in a separate column and the name of this column is by default _name_. A specific name of choice can be used by using the name= option on the proc transpose statement
  • The default name of the column which stores the transposed results is 'COL1'. This can be renamed to a suitable name using rename= dataset option. 

R tidyverse code


#Load required libraries
library(tidyverse)
library(haven)
  
wide<-tribble(
  ~usubjid,~_NAME_,~HGB,~ALT,~AST,
  "1001","lbstresn",13,30,23,
  "1002","lbstresn",12,28,15,
)
 
long<-pivot_longer(wide,cols=c(HGB,ALT,AST),names_to = "lbtestcd",values_to = "lbstresn")
  • tribble function is used to create sample input data
  • pivot_longer function can be used to restructure the wide data to long data in R tidyverse
  • The variables whose values need to be transposed have to be specified on cols= parameter
  • The name of the variable that stores the transposed variable names can be specified using names_to= parameter
  • The name of the variable that stores the transposed variable values can be specified using values_to= parameter




Post categories
SASnR
SDTM