Subset Observations
- Almost every analysis starts by filtering a dataset to a relevant subset — only males, only a certain age band, only a single site.
- SAS typically does this inside a
DATA step with a WHERE clause. R has two idiomatic flavours: filter() from dplyr, and [-subscript notation in base R.
- This lesson shows two common filters on the CLASS dataset: keep only rows where sex is “M”, and keep only rows where age is 11 or 12 (the pre-teens).
- After running, we should have two new datasets:
males with just the male subjects, and preteen with just ages 11 and 12.