How to drop unwanted variables/columns in SAS and R tidyverse?


This post is part of 'SASnR | Subset variables (columns)' series

When working with data, we frequently need to work with only a selected set of variables. For this, we need programming features to drop variables or columns which are not needed for analysis.

There are multiple ways of dropping the variables/columns which are not required in both SAS and R tidyverse. Below is one basic approach in both SAS and R.

Let us assume that we have a dataset named "class" with 5 variables named Name, Sex, Age, Height, Weight.

Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

 

Let us assume that we do not need Height and Weight for analysis. 

 

Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

 

We can create a new subset dataset (tibble/dataframe) using the below code.

SAS code


data class;
    set sashelp.class;
    drop height weight;
run;

Notes:

  • data statement is used to specify the name of the newly created dataset
  • set statement is used to specify the name of the input dataset
  • drop statement is used to specify the names of the variables which need to be dropped from the output dataset
  • note that sas is not case sensitive in terms of variable/column names

R tidyverse code

library(tidyverse)
library(haven)
setwd(dir = "D:/SAS/Home/dev/clinical_sas_samples/mycsg/SAS/SASnR/")
class<-haven::read_sas("class.sas7bdat")
class_selvars<-select(class,-Height,-Weight)

Notes:

  • tidyverse is loaded into R session using library function
  • As haven is not a core tidyverse package, it has be explicitly loaded 
  • setwd function is used to set the working directory, working directory is set to the same directory containing the SAS dataset
  • read_sas function of haven is used to read the sas dataset into R session as a tibble
  • select verb of dplyr of tidyverse is used to drop the variables which are not required for analysis.
  • Notice that the variables/columns that need to be dropped are to be prefixed with a minus sign
  • The first argument of select verb (function) is the name of the input tibble, followed by the list of variables to be dropped with a minus before the name of the variables each separated by a comma
  • note that R is case sensitive in terms of variable/column names - so we need use the same text case

 

Example class dataset (sas dataset) can be downloaded from here.





Post categories
SASnR
SDTM