Announcement Icon Online training class for Clinical SAS programming starting soon. Click here for details.

First dot concept


SAS code

data CLASS;
infile datalines dlm='|' dsd missover;
input Name : $8. Sex : $1. Age : best32. Height : best32. Weight : best32.;
label ;
format ;
datalines4;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
Judy|F|14|64.3|90
Louise|F|12|56.3|77
Mary|F|15|66.5|112
Philip|M|16|72|150
Robert|M|12|64.8|128
Ronald|M|15|67|133
Thomas|M|11|57.5|85
William|M|15|66.5|112
;;;;
run;

*------------------------------------------------------------------------------;
*counter;
*------------------------------------------------------------------------------;

proc sort data=class;
    by sex height weight name;
run;

data counter;
    set class;
    by sex height weight name;
    if first.sex then counter=1;
    
else counter+1;
run;

*------------------------------------------------------------------------------;
*subset of lowest height;
*------------------------------------------------------------------------------;

proc sort data=class;
    by sex height;
run;

data lowestheight;
    set class;
    by sex height;
    if first.sex;
    keep sex height name;
run;

 

SAS code description

These SAS code snippets showcase techniques for creating subsets of data based on specific criteria. The code includes a counter and a subset of the lowest height.

In the first code snippet:

The proc sort procedure is used to sort the "class" dataset by the variables "sex," "height," "weight," and "name."
The sorted dataset is then used in the subsequent data step.
The data step defines a new dataset named "counter" by using the set statement to read in the "class" dataset.
The by statement specifies the variables to be used for grouping the observations.
The if first.sex condition checks if it is the first observation for each unique value of "sex."
If it is the first observation, the "counter" variable is set to 1. Otherwise, it increments by 1.
After executing the first code snippet, the "counter" dataset will contain a counter variable that increments for each observation within each unique combination of "sex," "height," "weight," and "name."

In the second code snippet:

The proc sort procedure is used to sort the "class" dataset by the variables "sex" and "height."
The sorted dataset is then used in the subsequent data step.
The data step defines a new dataset named "lowestheight" by using the set statement to read in the sorted "class" dataset.
The by statement specifies the variables to be used for grouping the observations.
The if first.sex condition checks if it is the first observation for each unique value of "sex."
If it is the first observation, the observation is retained (keep statement) in the "lowestheight" dataset, including the variables "sex," "height," and "name."
After executing the second code snippet, the "lowestheight" dataset will contain the subset of observations with the lowest height for each unique value of "sex."

R code



library(tidyverse)
class<-tribble(
~Name,~Sex,~Age,~Height,~Weight,
"Alfred","M",14,69,112.5,
"Alice","F",13,56.5,84,
"Barbara","F",13,65.3,98,
"Carol","F",14,62.8,102.5,
"Henry","M",14,63.5,102.5,
"James","M",12,57.3,83,
"Jane","F",12,59.8,84.5,
"Janet","F",15,62.5,112.5,
"Jeffrey","M",13,62.5,84,
"John","M",12,59,99.5,
"Joyce","F",11,51.3,50.5,
"Judy","F",14,64.3,90,
"Louise","F",12,56.3,77,
"Mary","F",15,66.5,112,
"Philip","M",16,72,150,
"Robert","M",12,64.8,128,
"Ronald","M",15,67,133,
"Thomas","M",11,57.5,85,
"William","M",15,66.5,112,
)


counter<-class %>%   
  arrange(Sex,Height,Weight,Name) %>%   
  group_by(Sex) %>%   
  mutate(counter=row_number())


lowestheight<-class %>%   
  arrange(Sex,Height) %>%   
  group_by(Sex) %>%   
  slice(1) %>%   
  select(Name,Sex,Height)

R code description

These R Tidyverse code snippets demonstrate techniques for creating subsets of data based on specific criteria, including the use of a counter and obtaining the subset of observations with the lowest height.

In the first code snippet:

The arrange function is used to sort the "class" dataframe in ascending order by the variables "Sex," "Height," "Weight," and "Name."
The group_by function is used to group the observations by the variable "Sex."
The mutate function is used to create a new variable named "counter" using the row_number function, which assigns a sequential number to each observation within each group.
After executing the first code snippet, the "counter" variable in the "class" dataframe will contain the sequential numbers representing the order of observations within each unique value of "Sex."

In the second code snippet:

The arrange function is used to sort the "class" dataframe in ascending order by the variables "Sex" and "Height."
The group_by function is used to group the observations by the variable "Sex."
The slice function is used to select the first observation within each group, which corresponds to the observation with the lowest height.
The select function is used to choose specific variables ("Name," "Sex," and "Height") to include in the resulting dataframe.
After executing the second code snippet, the "lowestheight" dataframe will contain the subset of observations with the lowest height for each unique value of "Sex."