Sort/order observations based on the values in a single variable in SAS and R tidyverse


This post is part of 'SASnR | Sort (order) observations' series

When working with data, we frequently want to rearrange the observations present in data based on the values present in one or more variables.

So, we need programming features to sort/arrange the observations.

Let us assume that we have the below data, which represents the number of students in a class in each age group.


Age

Count

11

2

12

5

13

3

14

4

15

4

16

1

 


Let us assume that we need to arrange the observations in this data such that the age group with lowest count appears on top followed by the next lowest age group, as shown below.

Age

Count

16

1

11

2

13

3

14

4

15

4

12

5

There are multiple ways in acheiving this result in both SAS and R. Below is one approach to reorder the observations in a dataset.

 

SAS code


proc sort data=counts01 out=counts02;
    by count;
run;
  • the input data counts01 is created by fetching one-way frequencies on age variable (see example on one-way frequencies here)
  • sort procedure of SAS is used to sort the observations
  • data= option is used to specify the name of the input dataset on the proc sort statement
  • out= option is used to specify the name of the output dataset to store the sorted observations
  • the name of the input variable based on whose values we want to rearrange the observations is specified on the by statement
  • by default, the observations are sorted based on the ascending sequence of the values of the variable specifed

R tidyverse code


library(tidyverse)
library(haven)
setwd(dir = "D:/SAS/Home/dev/clinical_sas_samples/mycsg/SAS/SASnR/")
 
class<-haven::read_sas("class.sas7bdat")
 
groups<-group_by(class,Age)
 
counts01<-summarize(groups,count=n())
 
counts02<-arrange(counts01,count)

 

  • tidyverse and haven packages are loaded into the current R session using library function
  • working directory is set to the folder containing the input sas dataset using setwd function
  • read_sas function of haven package is used to import "class" sas dataset into R
  • group_by function is used to group the records based on the values in Age variable
  • summarize function is used to fetch the count the number of rows in each age group
  • arrange function is used to reorder the observations
  • the first argument of arrange function is the name of the input dataset
  • the second argument of the arrange is the name of the variable based on whose values we want to sort the observations

 

Example class dataset (sas dataset) can be downloaded from here.





Post categories
SASnR
SDTM