Announcement Icon Online training class for Clinical SAS programming starting soon. Click here for details.

First dot and last dot concept


SAS code


data CLASS;
infile datalines dlm='|' dsd missover;
input Name : $8. Sex : $1. Age : best32. Height : best32. Weight : best32.;
label ;
format ;
datalines4;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
Judy|F|14|64.3|90
Louise|F|12|56.3|77
Mary|F|15|66.5|112
Philip|M|16|72|150
Robert|M|12|64.8|128
Ronald|M|15|67|133
Thomas|M|11|57.5|85
William|M|15|66.5|112
;;;;
run;

proc sort data=class;
   by age;
run;

data only_one_in_group;
    set class;
    by age;
    if first.age and last.age;
run;

SAS code description

This SAS code snippet demonstrates how to create a subset of data by selecting only one observation from each group based on a specific variable, in this case, the "age" variable.

First, the PROC SORT step is used to sort the "class" dataset in ascending order by the "age" variable.

Then, in the DATA step:

The SET statement is used to read the sorted "class" dataset.
The BY statement specifies the variable "age" for processing the data in a sorted manner.
The IF FIRST.AGE AND LAST.AGE condition is used to select only the first and last observations within each unique value of "age". This condition becomes true only for observations that are the first and last within their respective groups.
The resulting dataset, named "only_one_in_group," will contain only one observation from each group based on the "age" variable.
This SAS code snippet allows you to extract a subset of data where each unique value of "age" is represented by only one observation. It is useful for situations where you need to identify and work with only one observation per group.

R code


library(tidyverse)

class<-tribble(
~name,~sex,~age,~height,~weight,
"Alfred","M",14,69,112.5,
"Alice","F",13,56.5,84,
"Barbara","F",13,65.3,98,
"Carol","F",14,62.8,102.5,
"Henry","M",14,63.5,102.5,
"James","M",12,57.3,83,
"Jane","F",12,59.8,84.5,
"Janet","F",15,62.5,112.5,
"Jeffrey","M",13,62.5,84,
"John","M",12,59,99.5,
"Joyce","F",11,51.3,50.5,
"Judy","F",14,64.3,90,
"Louise","F",12,56.3,77,
"Mary","F",15,66.5,112,
"Philip","M",16,72,150,
"Robert","M",12,64.8,128,
"Ronald","M",15,67,133,
"Thomas","M",11,57.5,85,
"William","M",15,66.5,112,
)

only_one_in_group<-class %>%  
  group_by(age) %>%   
  mutate(nrows=n()) %>%   
  filter(nrows==1)

R code description

This R Tidyverse code snippet demonstrates how to create a subset of data by selecting only one observation from each group based on a specific variable, in this case, the "age" variable.

Using the pipe operator %>%, the following operations are performed:

The group_by function is used to group the "class" data frame by the "age" variable.
The mutate function is applied to create a new variable named "nrows" that represents the number of observations in each group.
The filter function is used to keep only the observations where "nrows" is equal to 1, indicating that it is the only observation in its respective group.
After executing this code snippet, the resulting data frame "only_one_in_group" will contain only one observation from each group based on the "age" variable.