*Copyright @ www.mycsg.in;
What is the word meaning of subset?
a part of a larger group of related things.
Where do we need to subset data - in terms of variables?
Subset variables from a dataset
In an earlier lesson, we have how to subset the required observations using where and if statements
In this lesson, we will see how to subset variables from an existing datasets
Create sample input datasets
data class; set sashelp.class; run; data cars; set sashelp.cars; run;
View Log
151000 data class; 151001 set sashelp.class; 151002 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 19 observations and 5 variables. NOTE: Compressing data set WORK.CLASS increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 151003 151004 data cars; 151005 set sashelp.cars; 151006 run; NOTE: There were 428 observations read from the data set SASHELP.CARS. NOTE: The data set WORK.CARS has 428 observations and 15 variables. NOTE: Compressing data set WORK.CARS decreased size by 0.00 percent. Compressed is 2 pages; un-compressed would require 2 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 151007
View Data
Subset the variables using 'keep' statement
To subset the variables, we need to specify the required variables to be kept in the output dataset on the keep statement
Create a dataset named 'age' by keeping the variables Name and Age from class dataset
Notice in the log about the number of variables written to the output dataset
With keep statement, only those variables specified will be written to the output dataset
data age; set class; keep name age; run;
View Log
158501 data age; 158502 set class; 158503 keep name age; 158504 run; NOTE: There were 19 observations read from the data set WORK.CLASS. NOTE: The data set WORK.AGE has 19 observations and 2 variables. NOTE: Compressing data set WORK.AGE increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
View Data
Create a dataset named 'height_weight' by keeping the variables Name, Height, Weight from class dataset
Notice in the log about the number of variables written to the output dataset
data height_weight; set class; keep name height weight; run;
View Log
165992 data height_weight; 165993 set class; 165994 keep name height weight; 165995 run; NOTE: There were 19 observations read from the data set WORK.CLASS. NOTE: The data set WORK.HEIGHT_WEIGHT has 19 observations and 3 variables. NOTE: Compressing data set WORK.HEIGHT_WEIGHT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
View Data