When working with data, we frequently need to work with only a selected set of variables. For this, we need programming features to subset only required variables or columns.
There are multiple ways of selecting only required in both SAS and R tidyverse. Below is one basic approach in both SAS and R.
Let us assume that we have a dataset named "class" with 5 variables named Name, Sex, Age, Height, Weight.
Name |
Sex |
Age |
Height |
Weight |
Alfred |
M |
14 |
69 |
112.5 |
Alice |
F |
13 |
56.5 |
84 |
Barbara |
F |
13 |
65.3 |
98 |
Carol |
F |
14 |
62.8 |
102.5 |
Henry |
M |
14 |
63.5 |
102.5 |
James |
M |
12 |
57.3 |
83 |
Let us assume that we only need Name, Sex and Age variables.
Name |
Sex |
Age |
Height |
Weight |
Alfred |
M |
14 |
69 |
112.5 |
Alice |
F |
13 |
56.5 |
84 |
Barbara |
F |
13 |
65.3 |
98 |
Carol |
F |
14 |
62.8 |
102.5 |
Henry |
M |
14 |
63.5 |
102.5 |
James |
M |
12 |
57.3 |
83 |
We can create a new subset dataset (tibble/dataframe) using the below code.
data class;
set sashelp.class;
keep namge sex age;
run;
Example class dataset (sas dataset) can be downloaded from here.