mycsg SASnR (SAS and R)

SAS code

data CLASS;

infile datalines dlm='|' dsd missover;

input Name : $8. Sex : $1. Age : best32. Height : best32. Weight : best32.;

label ;

format ;

datalines4;

Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
;;;;

run;

 
*------------------------------------------------------------------------------;
*Keep;
*------------------------------------------------------------------------------;
 
data subset;

set class;

keep Name Sex Age;

run;

 
*------------------------------------------------------------------------------;
*drop;
*------------------------------------------------------------------------------;
 
data subset;

set class;

drop height weight;

run;

 
*------------------------------------------------------------------------------;
*drop - pattern;
*------------------------------------------------------------------------------;
data subset;

set class;

drop h: w:;

run;

SAS code description

These SAS code snippets highlight various techniques to create subsets of data from an original dataset. They offer flexibility in data management and analysis by allowing you to either select specific variables or exclude certain variables from the resulting datasets.

Step 1: Keep Variables

The first step demonstrates how to create a new dataset by selecting specific variables from an existing dataset using the keep statement. The resulting dataset, named "subset," includes only the variables "Name," "Sex," and "Age" from the original "class" dataset.

Step 2: Drop Variables

The second step showcases how to create a new dataset by excluding specific variables from the original dataset using the drop statement. In this case, the resulting "subset" dataset excludes the variables "Height" and "Weight" from the original "class" dataset.

Step 3: Drop Variables by Pattern

The third step demonstrates how to create a new dataset by dropping variables from the original dataset based on a specific pattern. The drop statement with the pattern h: w: excludes any variables starting with "h" or "w" from the resulting "subset" dataset.

These SAS code snippets provide practical approaches for creating focused subsets of data by selecting or excluding variables. These techniques enhance data management and enable more targeted analysis and processing of datasets.

R code


library(tidyverse)

class<-tribble(
  ~Name,~Sex,~Age,~Height,~Weight,
  "Alfred","M",14,69,112.5,
  "Alice","F",13,56.5,84,
  "Barbara","F",13,65.3,98,
  "Carol","F",14,62.8,102.5,
  "Henry","M",14,63.5,102.5,
  "James","M",12,57.3,83,
)

subset <- select(class, Name, Sex, Age)


subset <- select(class, -Height, -Weight)

subset<-select(class,-starts_with(c("H","W")))

subset<-select(class,-ends_with("t"))

R code description

The following R Tidyverse code snippets showcase different techniques to create subsets of data frames by selecting or excluding specific variables based on various criteria.

The first code snippet demonstrates how to create a new data frame named "subset" by selecting specific variables ("Name," "Sex," and "Age") from an existing data frame called "class" using the select function.

The second code snippet creates a new data frame named "subset" by excluding specific variables ("Height" and "Weight") from the original "class" data frame using the select function.

In the third code snippet, a new data frame named "subset" is created by excluding variables from the "class" data frame based on a pattern. The select function, combined with the starts_with function, removes variables that start with either "H" or "W" from the resulting "subset" data frame.

The fourth code snippet also creates a new data frame named "subset" by excluding variables from the "class" data frame based on a pattern. Here, the select function, along with the ends_with function, removes variables that end with the letter "t" from the resulting "subset" data frame.

These code snippets demonstrate the flexibility of the select function in the R Tidyverse, allowing for precise control over variable selection and exclusion when creating subsets of data frames. These techniques facilitate data management, analysis, and further processing by working with focused subsets of the original data.

Subset variables