Obtain frequencies/counts based on two variables - two-way frequencies in SAS and R tidyver


This post is part of 'SASnR | Obtain frequencies' series

When working with data, we frequently need to count the number of instances of something within a combination of values stored in two different variables.

For example:

  • Number of male and female students within each age group
  • Number of male and female patients within each treatment group etc

So, we need programming features which can give us the "counts"/"frequencies". When we are interested in the "number of" something based on the values present in a two variables, we call it two-way frequencies.

Let us assume that we have the data for 19 students of a class.


Name

Sex

Age

Height

Weight

Alfred

M

14

69

112.5

Alice

F

13

56.5

84

Barbara

F

13

65.3

98

Carol

F

14

62.8

102.5

Henry

M

14

63.5

102.5

James

M

12

57.3

83

Jane

F

12

59.8

84.5

Janet

F

15

62.5

112.5

Jeffrey

M

13

62.5

84

John

M

12

59

99.5

Joyce

F

11

51.3

50.5

Judy

F

14

64.3

90

Louise

F

12

56.3

77

Mary

F

15

66.5

112

Philip

M

16

72

150

Robert

M

12

64.8

128

Ronald

M

15

67

133

Thomas

M

11

57.5

85

William

M

15

66.5

112


Let us assume that we interested in creating a summary containing the number of male and female students in each age group as shown below.


Sex

Age

Count

F

11

1

F

12

2

F

13

2

F

14

2

F

15

2

M

11

1

M

12

3

M

13

1

M

14

2

M

15

2

M

16

1

There are multiple ways of obtaining this result in both SAS and R. Below is one such approach.

SAS code


proc freq data=sashelp.class   ;
    tables sex*age   /out=counts01;
run;
  • Freq procedure of SAS is used to obtain the frequencies
  • Data= option is used to specify the name of the input dataset on the proc freq statement
  • Tables statement is used to specify the names of the variables based on whose values we want to group the data
  • An asterisk has to be used in between the names of the variables to indicate that the data are to be grouped based on the unique values of the two specified variables
  • Out= option is used to specify the name of the output dataset into which the frequencies are to be stored
  • Frequencies will be stored in a dataset name counts01 as specified on the out= option of tables statement
  • The output dataset will contain one row per each unique combination of the values in the variables specified on tables statement
  • A variable named "count" will contain the number of instances of each unique combination of values occurred in the data

R tidyverse code


library(tidyverse)
library(haven)
setwd(dir = "D:/SAS/Home/dev/clinical_sas_samples/mycsg/SAS/SASnR/")
 
class<-haven::read_sas("class.sas7bdat")
 
groups<-group_by(class,Sex,Age)
 
counts01<-summarize(groups,count=n())
  • tidyverse and haven packages are loaded using library function
  • working directory is set to the folder containing the input dataset using setwd function
  • the sas dataset named "class" is read into R using "read_sas" function of haven package
  • group_by function is used to create groups based on the values of Sex and Age variables
  • summarize function is used to obtain the frequencies [using n()]

 

Example class dataset (sas dataset) can be downloaded from here.





Post categories
SASnR
SDTM