How to select/subset required rows in SAS and R tidyverse


This post is part of 'SASnR | Subset observations (rows)' series

When working with data, we frequently require only a subset of observations/rows for a specific analysis. 

So, we need programming features to select the specific subset of rows meeting a filter condition.

There are multiple ways of achieving this result in both SAS and R. Below is one basic approach in SAS and R.


Let us assume that we have the following input data with 19 observations and five variables capturing some basic information about the students of a class.

Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

Jane

F

12

59.8

84.5

Janet

F

15

62.5

112.5

Jeffrey

M

13

62.5

84

John

M

12

59

99.5

Joyce

F

11

51.3

50.5

Judy

F

14

64.3

90

Louise

F

12

56.3

77

Mary

F

15

66.5

112

Philip

M

16

72

150

Robert

M

12

64.8

128

Ronald

M

15

67

133

Thomas

M

11

57.5

85

William

M

15

66.5

112


Let us assume that we only data for male students for a particular analysis.

Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

Jane

F

12

59.8

84.5

Janet

F

15

62.5

112.5

Jeffrey

M

13

62.5

84

John

M

12

59

99.5

Joyce

F

11

51.3

50.5

Judy

F

14

64.3

90

Louise

F

12

56.3

77

Mary

F

15

66.5

112

Philip

M

16

72

150

Robert

M

12

64.8

128

Ronald

M

15

67

133

Thomas

M

11

57.5

85

William

M

15

66.5

112

We can create a subset of male students (Sex="M") using the below code.


SAS code

data class;
    set sashelp.class;
    where sex="M";
run;
  • SAS data step is used to create a new dataset
  • data statement is used to specify the name of the output dataset to be created
  • set statement is used to specify the name of the input dataset
  • where statement is used to specify the filter condition (sex="M", in this case). Only the records which meet the filter condition will be read from input dataset when creating this new dataset

R tidyverse code

library(tidyverse)
library(haven)
setwd(dir = "D:/SAS/Home/dev/clinical_sas_samples/mycsg/SAS/SASnR/")
class<-haven::read_sas("class.sas7bdat")
males<-filter(class,Sex=="M")
  • tidyverse library is loaded into the current R session using library function
  • as haven is not a core tidyverse package, it has be explicitly loaded 
  • setwd function is used to set the working directory, working directory is set to the same directory containing the SAS dataset
  • read_sas function of haven is used to read the sas dataset into R session as a tibble
  • filter verb(function) is used to subset only the required rows (male student rows) and the associated result is stored in a new dataset/tibble name males
  • the first argument of filter function is the name of the input dataset
  • the second argument is the filter condition
  • note that a double equals sign is used as part of the filter condition (Sex=="M"). In R, to check whether two objects are equal (equality) we need to use a double equals sign ==.

 

Example class dataset (sas dataset) can be downloaded from here.

 





Filter a category
AllSASnRSDTMSASDomainADaMStatisticsGeneral

List of other posts


Domain


General
What is MedDRA
What is WHO-DD
What is ATC classification of Drugs
Tables vs Listings in Clinical Trial Analysis
Analysis Sets in Clinical Trial Data Analysis
Validation of TFLs in Clinical Trials
Vital Signs in Clinical Trials
ECG in clinical trials
Solid tumors vs other cancers
What is 21 CFR Part 11?
The Drug development process
Overview of clinical trial process
What is a clinical trial?
What is Pinnacle 21?
Who conducts clinical trials?
Why are clinical trials conducted?
What are the phases of clinical trials?
What is a clinical trial protocol?
Case Report Form (CRF)
Database annotated CRF
What is a clinical trial registry?
Factors affecting drug metabolism and activity
Prior and Concomitant medications
Inclusion/Exclusion Criteria in a clinical trial
What happens after a clinical trial is completed?
The Investigational New Drug Process
Preclinical Research
Drug discovery
FDA Drug Review
What is the importance of baseline characteristics in a clinical trial?
Why do we need CDISC standards?
What is a clinical development plan?
What is a clinical study report?
"Exploratory study" vs "Confirmatory study"
What is ICH?
Clinical trial registries
Efficacy data vs Safety data
What is clinical data management?
Clinical SAS Programmer
Statistical Analysis Plan
What is RECIST 1.1?

Trial design aspects
What is a crossover clinical trial?
What is blinding in clinical trials?
What is an open-label clinical trial?
What is randomization in clinical trials?

Terminology
What is a cohort?

SAS


Definitions
What is a computer?
What happens when we execute a SAS program?
What is software?
What is SAS?
What is data?
What is data entry?
What is data retrieval?
What is data management?
What is "Report"?
What is statistics?
What is Statistical Analysis?
How do we use SAS?
What kind of questions can SAS help us answer?
How do we provide instructions to SAS?
What is a SAS program?
What does a SAS program contain?

General
Attributes of a SAS dataset
Rules for SAS dataset names
Rules for SAS variable names
Rules for SAS library names
Rules for character SAS format names
Reserved SAS dataset names
Rules for numeric SAS format names
What can SAS dataset options do?
Attributes of a SAS variable
Automatic conversion of data types in SAS
How does SAS expect our data to be organized?
Introduction to SAS interface
By groups in SAS

Informats
Rules for character SAS informat names

Proc freq
Count the number of times a particular value occurred in a variable of a dataset

Proc contents
Check the list of variables in a SAS dataset

Proc datasets
Delete all sas datasets from a library
Delete specific sas datasets from a library
Save specific sas datasets (and delete others) of a library
Rename SAS datasets using proc datasets change statement

Log issues
WARNING: No matching members in directory.

One-line definitions
What is a SAS library?
What is a libref?
What is an input statement?
What is infile statement?
What is set statement?
What is length statement?

SDTM


General
How to derive baseline flag in SDTM
How to create SEQ variable in SDTM
New domains in SDTM IG 3.3
What is a codelist?
How to derive study day variable in SDTM
What is SDTM?
Why do we need SDTM?
What is SDTM annotated Case Report Form (acrf.pdf)
How to convert original results to standard results using conversion factors

Demographics
What information does SDTM.DM (Demographics) contain?

Adverse Events
What is causality assessment?
What information does SDTM.AE (Adverse Events) contain?

Disposition
What information does SDTM.DS (Disposition) contain?

Concomitant Medications
What information does SDTM.CM (Concomitant/Prior Medications) contain?

Procedures
What information does SDTM.PR (Procedures) contain?

ADaM


General
What is ADaM?

ADSL
What is ADSL as per ADaM standard?

BDS
What is BDS as per ADaM standard?

Statistics


General
What is correlation?
Descriptive statistics
Inferential statistics
ANCOVA - Basic example
ANOVA - Basic example
What is Binomial test
What is chi-square test
What is 95% confidence interval
Measurements of central tendency
Measurements of spread
P-value
Alpha (Type I) error
Beta (Type II) error
Repeated measures analysis
ttest - introduction
Multiple imputation
What is Standard error
What is Fishers exact test
What is Least Squares Means

SASnR


Introduction
What is R?
What is an R package?
What is tidyverse?
What are the core packages of tidyverse?
What is haven package of tidyverse?
How to install tidyverse?
How to load core tidyverse packages into the R session?

Reading data
Import/Read SAS dataset into R

Creating sample data
How to create some sample data in SAS and R tidyverse

Subset variables (columns)
How to select only required variables/columns in SAS and R tidyverse?
How to drop unwanted variables/columns in SAS and R tidyverse?

Subset observations (rows)
How to select/subset required rows in SAS and R tidyverse

Appending data
Append two datasets in SAS and R tidyverse

Merging/joining data
Merge/full join two datasets in SAS and R tidyverse
Merge/inner join two datasets in SAS and R tidyverse
Merge/left join two datasets in SAS and R tidyverse

Sort (order) observations
Sort/order observations based on the values in a single variable in SAS and R tidyverse

Transpose/Restructure data
Restructure/transpose long data to wide data
Restructure/transpose wide data to long data

Obtain frequencies
Obtain frequencies/counts based on one variable - one-way frequencies in SAS and R tidyverse
Obtain frequencies/counts based on two variables - two-way frequencies in SAS and R tidyverse

Descriptive statistics
Descriptive statistics for a numeric variable using SAS and R tidyverse

General


General
You are good! But... You can be much better!!
A little motivational post!