How to select/subset required rows in SAS and R tidyverse


This post is part of 'SASnR | Subset observations (rows)' series

When working with data, we frequently require only a subset of observations/rows for a specific analysis. 

So, we need programming features to select the specific subset of rows meeting a filter condition.

There are multiple ways of achieving this result in both SAS and R. Below is one basic approach in SAS and R.


Let us assume that we have the following input data with 19 observations and five variables capturing some basic information about the students of a class.

Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

Jane

F

12

59.8

84.5

Janet

F

15

62.5

112.5

Jeffrey

M

13

62.5

84

John

M

12

59

99.5

Joyce

F

11

51.3

50.5

Judy

F

14

64.3

90

Louise

F

12

56.3

77

Mary

F

15

66.5

112

Philip

M

16

72

150

Robert

M

12

64.8

128

Ronald

M

15

67

133

Thomas

M

11

57.5

85

William

M

15

66.5

112


Let us assume that we only data for male students for a particular analysis.

Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

Jane

F

12

59.8

84.5

Janet

F

15

62.5

112.5

Jeffrey

M

13

62.5

84

John

M

12

59

99.5

Joyce

F

11

51.3

50.5

Judy

F

14

64.3

90

Louise

F

12

56.3

77

Mary

F

15

66.5

112

Philip

M

16

72

150

Robert

M

12

64.8

128

Ronald

M

15

67

133

Thomas

M

11

57.5

85

William

M

15

66.5

112

We can create a subset of male students (Sex="M") using the below code.


SAS code

data class;
    set sashelp.class;
    where sex="M";
run;
  • SAS data step is used to create a new dataset
  • data statement is used to specify the name of the output dataset to be created
  • set statement is used to specify the name of the input dataset
  • where statement is used to specify the filter condition (sex="M", in this case). Only the records which meet the filter condition will be read from input dataset when creating this new dataset

R tidyverse code

library(tidyverse)
library(haven)
setwd(dir = "D:/SAS/Home/dev/clinical_sas_samples/mycsg/SAS/SASnR/")
class<-haven::read_sas("class.sas7bdat")
males<-filter(class,Sex=="M")
  • tidyverse library is loaded into the current R session using library function
  • as haven is not a core tidyverse package, it has be explicitly loaded 
  • setwd function is used to set the working directory, working directory is set to the same directory containing the SAS dataset
  • read_sas function of haven is used to read the sas dataset into R session as a tibble
  • filter verb(function) is used to subset only the required rows (male student rows) and the associated result is stored in a new dataset/tibble name males
  • the first argument of filter function is the name of the input dataset
  • the second argument is the filter condition
  • note that a double equals sign is used as part of the filter condition (Sex=="M"). In R, to check whether two objects are equal (equality) we need to use a double equals sign ==.

 

Example class dataset (sas dataset) can be downloaded from here.

 





Post categories
SASnR
SDTM