Obtain frequencies/counts based on two variables - two-way frequencies in SAS and R tidyverse


This post is part of 'SASnR | Obtain frequencies' series

When working with data, we frequently need to count the number of instances of something within a combination of values stored in two different variables.

For example:

  • Number of male and female students within each age group
  • Number of male and female patients within each treatment group etc

So, we need programming features which can give us the "counts"/"frequencies". When we are interested in the "number of" something based on the values present in a two variables, we call it two-way frequencies.

Let us assume that we have the data for 19 students of a class.


Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

Jane

F

12

59.8

84.5

Janet

F

15

62.5

112.5

Jeffrey

M

13

62.5

84

John

M

12

59

99.5

Joyce

F

11

51.3

50.5

Judy

F

14

64.3

90

Louise

F

12

56.3

77

Mary

F

15

66.5

112

Philip

M

16

72

150

Robert

M

12

64.8

128

Ronald

M

15

67

133

Thomas

M

11

57.5

85

William

M

15

66.5

112


Let us assume that we interested in creating a summary containing the number of male and female students in each age group as shown below.


Sex

Age

Count

F

11

1

F

12

2

F

13

2

F

14

2

F

15

2

M

11

1

M

12

3

M

13

1

M

14

2

M

15

2

M

16

1

There are multiple ways of obtaining this result in both SAS and R. Below is one such approach.

SAS code


proc freq data=sashelp.class   ;
    tables sex*age   /out=counts01;
run;
  • Freq procedure of SAS is used to obtain the frequencies
  • Data= option is used to specify the name of the input dataset on the proc freq statement
  • Tables statement is used to specify the names of the variables based on whose values we want to group the data
  • An asterisk has to be used in between the names of the variables to indicate that the data are to be grouped based on the unique values of the two specified variables
  • Out= option is used to specify the name of the output dataset into which the frequencies are to be stored
  • Frequencies will be stored in a dataset name counts01 as specified on the out= option of tables statement
  • The output dataset will contain one row per each unique combination of the values in the variables specified on tables statement
  • A variable named "count" will contain the number of instances of each unique combination of values occurred in the data

R tidyverse code


library(tidyverse)
library(haven)
setwd(dir = "D:/SAS/Home/dev/clinical_sas_samples/mycsg/SAS/SASnR/")
 
class<-haven::read_sas("class.sas7bdat")
 
groups<-group_by(class,Sex,Age)
 
counts01<-summarize(groups,count=n())
 
  • tidyverse and haven packages are loaded using library function
  • working directory is set to the folder containing the input dataset using setwd function
  • the sas dataset named "class" is read into R using "read_sas" function of haven package
  • group_by function is used to create groups based on the values of Sex and Age variables
  • summarize function is used to obtain the frequencies [using n()]

 

Example class dataset (sas dataset) can be downloaded from here.





Filter a category
AllSDTMSASDomainADaMStatisticsGeneral

List of other posts


Domain


General
What is MedDRA
What is WHO-DD
What is ATC classification of Drugs
Tables vs Listings in Clinical Trial Analysis
Analysis Sets in Clinical Trial Data Analysis
Validation of TFLs in Clinical Trials
Vital Signs in Clinical Trials
ECG in clinical trials
Solid tumors vs other cancers
What is 21 CFR Part 11?
Declaration of Helsinki
Importance of Harmonization (ICH)
The Drug development process
Overview of clinical trial process
What is a clinical trial?
What is Pinnacle 21?
Who conducts clinical trials?
Why are clinical trials conducted?
What are the phases of clinical trials?
Why are oncology clinical trials considered critical?
What is a clinical trial protocol?
Case Report Form (CRF)
Database annotated CRF
What is a clinical trial registry?
Factors affecting drug metabolism and activity
Prior and Concomitant medications
Inclusion/Exclusion Criteria in a clinical trial
What happens after a clinical trial is completed?
The Investigational New Drug Process
Preclinical Research
Drug discovery
FDA Drug Review
What is the importance of baseline characteristics in a clinical trial?
Why do we need CDISC standards?
What is a clinical development plan?
What is a clinical study report?
"Exploratory study" vs "Confirmatory study"
What is ICH?
Clinical trial registries
Efficacy data vs Safety data
What is clinical data management?
Clinical SAS Programmer
Statistical Analysis Plan
The 27 System Ogran classes (MedDRA)
What is RECIST 1.1?

Trial design aspects
What is a crossover clinical trial?
What is blinding in clinical trials?
What is an open-label clinical trial?
What is randomization in clinical trials?

Terminology
What is a cohort?

SAS


Definitions
What is a computer?
What happens when we execute a SAS program?
What is software?
What is SAS?
What is data?
What is data entry?
What is data retrieval?
What is data management?
What is "Report"?
What is statistics?
What is Statistical Analysis?
How do we use SAS?
What kind of questions can SAS help us answer?
How do we provide instructions to SAS?
What is a SAS program?
What does a SAS program contain?

General
Attributes of a SAS dataset
Rules for SAS dataset names
Rules for SAS variable names
Rules for SAS library names
Rules for character SAS format names
Reserved SAS dataset names
Rules for numeric SAS format names
What can SAS dataset options do?
Attributes of a SAS variable
Automatic conversion of data types in SAS
How does SAS expect our data to be organized?
Introduction to SAS interface
By groups in SAS

Informats
Rules for character SAS informat names

Proc freq
Count the number of times a particular value occurred in a variable of a dataset

Proc contents
Check the list of variables in a SAS dataset

Proc datasets
Delete all sas datasets from a library
Delete specific sas datasets from a library
Save specific sas datasets (and delete others) of a library
Rename SAS datasets using proc datasets change statement

Log issues
WARNING: No matching members in directory.

One-line definitions
What is a SAS library?
What is a libref?
What is an input statement?
What is infile statement?
What is set statement?
What is length statement?

SDTM


General
How to derive baseline flag in SDTM
How to create SEQ variable in SDTM
New domains in SDTM IG 3.3
What is a codelist?
How to derive study day variable in SDTM
What is SDTM?
Why do we need SDTM?
What is SDTM annotated Case Report Form (acrf.pdf)
How to convert original results to standard results using conversion factors

Demographics
What information does SDTM.DM (Demographics) contain?

Adverse Events
What is causality assessment?
What information does SDTM.AE (Adverse Events) contain?

Disposition
What information does SDTM.DS (Disposition) contain?

Concomitant Medications
What information does SDTM.CM (Concomitant/Prior Medications) contain?

Procedures
What information does SDTM.PR (Procedures) contain?

ADaM


General
What is ADaM?

ADSL
What is ADSL as per ADaM standard?

BDS
What is BDS as per ADaM standard?

Statistics


General
What is correlation?
Descriptive statistics
Inferential statistics
ANCOVA - Basic example
ANOVA - Basic example
What is Binomial test
What is chi-square test
What is 95% confidence interval
Measurements of central tendency
Measurements of spread
P-value
Alpha (Type I) error
Beta (Type II) error
Repeated measures analysis
ttest - introduction
Multiple imputation
What is Standard error
What is Fishers exact test
What is Least Squares Means

General

You are good! But... You can be much better!!
A little motivational post!
"Book to bill" ratio