*Copyright @ www.mycsg.in;
General concepts in SAS
Before learning specific procedures it is useful to understand a few general rules that apply across the SAS language
These rules affect how we write statements name datasets and variables and organise code into steps
Strong understanding of these basics reduces syntax errors and makes later lessons easier to follow
Elements of the SAS language
The SAS language contains statements expressions functions CALL routines options formats and informats
Statements tell SAS what action to perform such as creating a dataset sorting data or printing output
Expressions combine values operators and functions to produce a result
Formats control how values are displayed while informats control how raw values are read
Rules for SAS statements
SAS statements end with a semicolon and missing a semicolon is one of the most common beginner mistakes
You can write SAS keywords in uppercase lowercase or mixed case because the language is generally not case sensitive for keywords
A statement can begin in any column and can continue to the next line when needed
You can place multiple short statements on one line although most programmers prefer one statement per line for readability
You cannot split a word across two lines
Simple example showing SAS statements
We create a dataset named `class01` from `sashelp.class`
Then we print the dataset using `proc print`
Notice that every SAS statement ends with a semicolon
data class01; set sashelp.class(obs=5); run; proc print data=class01; run;
Copy Code
View Log
SAS Log
data class01; set sashelp.class(obs=5); run; NOTE: There were 5 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS01 has 5 observations and 5 variables. NOTE: Compressing data set WORK.CLASS01 increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds proc print data=class01; run; NOTE: There were 5 observations read from the data set WORK.CLASS01. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
The `data` and `set` statements work together to create a new dataset
The `proc print` step displays the created observations in the output window
Review the log and confirm that both the DATA step and PROC step completed without errors
View Data
Dataset View
Rules for most SAS names
Most SAS names can contain 1 to 32 characters
The first character must be a letter or underscore
Subsequent characters can be letters numbers or underscores
Blanks are not allowed in standard SAS names
These rules apply to common items such as dataset names variable names and library references although some special cases exist in advanced usage
Examples of valid and invalid SAS names
Valid names include `age` `_temp` `class01` and `height_cm`
Invalid standard names include `1age` `student name` and `total-cost` because they start incorrectly contain blanks or include unsupported characters
Dataset names
Dataset names follow the standard SAS naming rules
A one level name such as `class01` creates a dataset in the default library usually `WORK`
A two level name such as `sashelp.class` identifies both the library and the dataset name
Copy a dataset from SASHELP to WORK
`sashelp.class` is a two level name where `sashelp` is the library and `class` is the dataset name
`class_copy` is a one level name and will be created in the `WORK` library
This example helps distinguish between permanent style references and temporary work datasets
data class_copy; set sashelp.class; run;
Copy Code
View Log
SAS Log
data class_copy; set sashelp.class; run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS_COPY has 19 observations and 5 variables. NOTE: Compressing data set WORK.CLASS_COPY increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
The new dataset `class_copy` now exists in the `WORK` library
Inspect the output dataset and confirm that its rows match those from `sashelp.class`
View Data
Dataset View
Variable names
Variable names also follow the standard naming rules
Variable names are generally not case sensitive in normal SAS usage so `Age` `AGE` and `age` all refer to the same variable
Choosing meaningful variable names improves readability and maintenance
Create new variables with valid names
We create new variables named `bmi_proxy` and `age_group` to demonstrate valid variable naming
These names start with letters and contain only letters underscores and numbers
After running the code inspect the created variables in the output dataset
data class_named; set sashelp.class; bmi_proxy = round((weight/(height*height))*703,.01); if age le 12 then age_group="Younger"; else age_group="Older"; run;
Copy Code
View Log
SAS Log
data class_named; set sashelp.class; bmi_proxy = round((weight/(height*height))*703,.01); if age le 12 then age_group="Younger"; else age_group="Older"; run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS_NAMED has 19 observations and 7 variables. NOTE: Compressing data set WORK.CLASS_NAMED increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
The output dataset contains the original variables plus the newly created variables
This example reinforces both assignment statements and good variable naming practice
View Data
Dataset View
Library names
A library is a collection of SAS files that is referenced by a short name called a libref
Librefs typically follow standard SAS naming rules and are limited to 8 characters in many SAS environments
Common examples include `WORK` for temporary files and `SASHELP` for sample data supplied by SAS
Using a two level name such as `work.class01` makes the library reference explicit
Display the difference between one level and two level naming
This small example writes a dataset using an explicit library name
The resulting dataset is still created in `WORK` but the reference makes the library visible in the code
data work.class_explicit; set sashelp.class(obs=3); run;
Copy Code
View Log
SAS Log
data work.class_explicit; set sashelp.class(obs=3); run; NOTE: There were 3 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS_EXPLICIT has 3 observations and 5 variables. NOTE: Compressing data set WORK.CLASS_EXPLICIT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
Compare `class_explicit` with earlier one level datasets and note that the content idea is the same even though the reference is more explicit
View Data
Dataset View
Format and informat names
Formats control how values are displayed while informats control how raw values are read into SAS variables
For example, a numeric date value can be displayed with a date format even though the stored value is numeric
Understanding this difference is important when reading raw data and presenting output
Simple format example
We create a date value and apply the `date9.` format so the displayed value is easier to read
The stored value remains numeric even though the display changes
data date_demo; today_num = today(); format today_num date9.; run;
Copy Code
View Log
SAS Log
data date_demo; today_num = today(); format today_num date9.; run; NOTE: Compression was disabled for data set WORK.DATE_DEMO because compression overhead would increase the size of the data set. NOTE: The data set WORK.DATE_DEMO has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
The dataset stores a numeric date value in `today_num`
The format changes how that value is shown when viewed or printed
View Data
Dataset View
Key points to remember
SAS statements end with semicolons
Dataset names variable names and many other SAS names follow standard naming rules
One level names use the default library while two level names specify both library and member name
Variable names are generally not case sensitive
Formats and informats serve different purposes and are fundamental to data input and display