All SAS statements (except those containing data) must end with a semicolon (;).
SAS statements typically begin with a SAS keyword

SAS programs can be freely formatted:
Any number of SAS statements can appear on a single line provided they are separated by a semicolon.
A SAS statement can be continued from one line to the next as long as no word is split.
SAS statements can begin in any column.
SAS statements are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two.
The words in SAS statements are separated by blanks or special characters (e.g. =, +, or *).
All names must contain between 1 and 32 characters.
The first character appearing in a name must be a letter (A, B, ...Z, a, b, ... z) or an underscore (_).
Subsequent characters must be letters, numbers, or underscores. That is, no other characters, such as $, %, or & are permitted.
Blanks also cannot appear in SAS names. SAS names are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (SAS is only case sensitive within quotation marks.)
A SAS data set is a SAS file that contains data values that are organized as a table of observations (rows) and variables (columns) that can be processed by SAS software. A SAS data set also contains descriptor information such as the data types and lengths of the variables
Data is unprocessed facts and figures without any added interpretation or analysis. eg: "The price of crude oil is $80 per barrel."
Editor window
Log window
Output window
Explorer window
Results window.
Enhanced editor window allows us to perform standard editing like entering, editing and submitting programs
The Log Window displays messages about our SAS session and any programs that we submit
SAS uses the following color-coded system to assist you in reading the log:
The DATA and PROC steps that appear in your program are printed in black.
Notes that SAS wants to report to you are printed in blue
Warnings that SAS wants to draw to your attention are printed in green
Errors that cause SAS to abort running your program are printed in red
Output window enables us to view the LISTING output from our SAS programs.
The Explorer Window allows to easily view and manage our SAS files, which are stored in SAS data libraries.
The Results window serves as an index in a tree structure to the various types of output generated by the submitted SAS programs in the session.
A SAS dataset that exists only during the current SAS session is called a temporary dataset. All datasets stored in work library are temporary datasets.
A SAS dataset that is stored in a location on the computer and exist after exiting SAS session is called a permanent dataset. Generally, any dataset that is stored in any library other than work library is considered a permanent dataset.
A data value that contains only numbers, decimal point or minus sign is called standard numeric data.
A data value that represents a number but contain additional symbols like comma, dollar sign, or blanks is called a non-standard numeric data. Date, time and date time values are also considered non-standard numeric data in SAS.
A number within an embedded comma in it: 2,341
A number with an embedded dollar sign: $100
A date value: 1Mar1990
A time value: 10:23
A comment is any text that is used to document the purpose of the program, explain unusual segments of the program, or describe steps in a complex program or calculation. SAS ignores text in comment statements during processing.
2 types
1) /*comment*/
2)*comment;
Input statement is used for describing the arrangement of values in the input data record and assigns input values to the corresponding SAS variables.
Infile statment is used for specifying an external file to read with an INPUT statement.
List input is one of the methods for reading raw data into SAS. In this method, SAS scans the input data record for input values and assigns them to the corresponding SAS variables. We use this method when the data values are separated by atleast one delimiter (data not arranged in specific columns).
Fields must be separated by at least one blank (or other delimiter).
Fields must be read in order from left to right.
You cannot skip or re-read fields.
Missing values must be represented by a place holder such as a period. (A blank field causes the matching of variable names and values to get out of sync.)
Character values can't contain embedded blanks.
The default length of character values is 8 bytes. A longer value is truncated when it is written to the data set. (NOTE: 1 byte = 1 character).
Data must be in standard character or numeric format.
Column input is used for readin data values that are entered in fixed columns
By default, when reading raw data using list input, if there are less fields on an input data record than the number of variables specified on the input statement, it tries to fetch the next record to read values for the remaining variables. MISSOVER option in the INFILE statement sets the remaining INPUT statement variables to missing values instead of allowing SAS to fetch new input record.
When reading raw data, if there are less fields on an input data record than the number of variables specified on the input statement, Flowover option on infile statement tells SAS to fetch the next record to read values for the remaining variables.
This input statement tells SAS to read a character variable named 'name' on the first of the two records from input corresponding to an observation and #2 tells SAS to move the pointer to the second record to read the value for ID from columns 3 and 4:
This input statment tells SAS to read the values for NAME and AGE from the first input record before the pointer moves to the second record to read the value of ID from columns 3 and 4
There are two format modifiers, '?' and '??'
The '?' format modifier suppresses printing the invalid data note when SAS encounters invalid data values.
The '?' format modifier suppresses printing the invalid data note when SAS encounters invalid data values.
It also prevents the automatic variable _ERROR_ to be set to 1 for invalid observation.
If an input statement contains more than one type of input styles(list, column, formatted) then we call it as mixed input style.
This input statement tells SAS to read the values of IDNO, STARTWGHT, and ENDWGHT with list input, to read the value of NAME with formatted input, and to read the value of TEAM with column input.
DSD (Delimiter-Sensitive Data) is an option in infile statement of SAS. It does three things for us.
1: it ignores delimiters in data values enclosed in quotation marks
2: it ignores quotation marks as part of your data
3: it treats two consecutive delimiters in a row as missing value
Input buffer is a temporary area of memory into which each record of data is read when the INPUT statement executes.
Program Data Vector is a temporary logical area of computer memory in which SAS builds a SAS data set, one observation at a time.
_N_ and _ERROR are the two automatic variables are created automatically by the DATA step. These variables are added to the program data vector but are not output to the data set being created. The values of automatic variables are retained from one iteration of the DATA step to the next, rather than set to missing. Note that there are other automatic variables created by certain other data step statments, like 'in=' 'end=' 'first.variable', and 'last.variable'.
_ERROR_ is an automatic variable in SAS to keep track of an encountered error, such as an input data error, a conversion error, or a math error, as in division by 0. The default value of it is 0 and turns to 1 when there is an error.
_N_ is an automatic variable in SAS created in data step and it increments by 1 each time the DATA step loops past the DATA statement. The value of _N_ represents the number of times the DATA step has iterated.
With column and formatted input, the pointer reads the columns that are indicated in the INPUT statement and stops in the next column. With list input, however, the pointer scans data records to locate data values and reads a blank to indicate that a value has ended. After reading a value with list input, the pointer stops in the second column after the value.
*Column pointer controls indicate the column in which an input value starts. Use line pointer controls within the INPUT statement to move to the next input record or to define the number of input records per observation.
sets the value of the variable that is being read to missing or the value that is specified with the INVALIDDATA= system option. prints an invalid data note in the SAS log. prints the input line and column number that contains the invalid value in the SAS log. Unprintable characters appear in hexadecimal. To help determine column numbers, SAS prints a rule line above the input line. sets the automatic variable _ERROR_ to 1 for the current observation.
*1)length 4)label 2)name 5)format 3)type 6)informat. 4,5,6 attributes are optinal
The lowest length that can be given to a numeric variable in SAS is 3, with this length SAS can store values upto 8192 (a value corresponding to 2 to the power of 13)
2 to the power of 53 (9,007,199,254,740,992)
3 significant digits stored with a length of 3 and 15 significant digits stored without losing precision.
An informat is an instruction that SAS uses to read data values into a variable. For example, the following value represents a numberic value but contains a dollar sign and commas: $1,000,000 An informat named COMMA11. is used to remove the dollar sign ($) and commas (,) before storing the numeric value 1000000 in a variable.
A format is an instruction that SAS uses to write data values. We use formats to control the written appearance of data values. For example, the WORDS22. format, which converts numeric values to their equivalent in words, writes the numeric value 692 as six hundred ninety-two.
SAS is an integrated system of software solutions which can be used in data management, statistical and mathematical analysis, report writing, forecasting and other activities.
Descriptor portion of a SAS consists of both Dataset attributes and Variable attributes
Dataset attributes
Name, Number of variables, Number of observations, Creation and Last modified date, Sort information, names and attributes of all variables, Indexes Engine and host level information, filename (with complete path)
Variable attributes
Name, type, length, label, format, informat, position, index type
Name, type, length, label, format, informat, position
Variables read from a data set
Variables mentioned in retain statement
Variables created using sum statement
Variables initialized in a temporary array
_N_ and _ERROR_ are also not reset to missing (_N_ increments by 1 and _ERROR_ resets to 0)
The YEARCUTOFF= system option is used to interpret dates with two-digit years.
Prior to SAS 9.4 the default yearcutoff value in SAS is 1920 and in version 9.4 it is 1926.
When yearcutoff is 1926, two-digit years of 26 through 99 are assigned a century prefix of "19" (i.e. 1926-1999), and two-digit years from 00 through 25 are assigned a century prefix of "20" (i.e. 2000-2025)
Filename fitclub 'path to the folder' Infile fitclub(club1); Infile fitclub(club2);
When you mention a variable in both DROP and KEEP statements, DROP takes precedence
When operators of equal precedence appear, the operations are performed from left to right (except exponentiation, which is performed from right to left) 1**2**2 is 1, 3**2**2 is 81, 3**2**0 is 3 (2**0 is evaluated first).
1) Multiplication and division are of equal precedence and
2) Addition and subtraction are of equal precedence
Colon causes SAS to compare the same number of characters in the shorter value and the longer value. The example on the left identifies all the cases which start with ?Al?. We can also use colon modifier with other operators as well, like <=:?L?, >:?M? etc.
SAS considers weekdays as 1 through 7 (from Sunday to Saturday). So, a weekday of 1 represents Sunday.
SAS stores date values as number of days from 1Jan1960. SAS date value for 1Jan1960 is 0, 2Jan1960 is 1, 3Jan1960 is 3, 31Dec1959 is -1.
If two or more data sets explicitly define different formats, informats, or labels for the same variable, then the variable in the new dataset assumes the attribute from the first dataset in the set statement that explicitly defines the attribute.
Used when the master and transaction datasets are not the same in descriptor portion. Force will make SAS ignore the variables which are not in base dataset and sets the values to missing for the observations being read from transaction dataset.
Syntax:
misspelled keyword, missing or invalid punctuation, invalid statement or dataset options
Execution-time:
dataset not present, illegal math operations, observations out of order in by group processing
Data:
invalid values
Semantic:
wrong number of arguments for a function, usage of unassigned libref etc
SAS stores the time values in numeric format as number of seconds from 12:00 am. A numeric value of 3600 represents 1:00 am as there are 3600 seconds in an hour. Similary, a value of 7201 represents 2:00:01 am.
SAS stores the datetime values in numeric format as number of seconds from 12:00 am or 1Jan1960.
_N_, an automatic variable created in data step processing which keeps track of the number of times the data step has iterated
_ERROR_, is an automatic variable in a data step to keep track of data errors in the current observation being processed. The default value is 0, when there is at least data error, the variable's value gets set to 1
_INFILE_, is an automatic variable that gets created when reading raw data from an external file and is used to store the content of input buffer
END=, this option can be used on the SET statement to create a temporary variable to identify the processing of last observation in a dataset. The value in the temporary will be 1 when last observation is processed and it will be 0 when all other observations are processed
NOBS=, this option can be used to create a temporary variable to hold the value of the total number of observations in the input dataset being read
IN=, this dataset option can be used on either MERGE or SET statement to create a temporary variable to identify if the particular dataset has contributed to the observation being built in PDV, this temporary variable value will be 1 if the dataset contributed to the observation otherwise it's value will be 0
FIRST.VAR, LAST.VAR, these are the temporary variables which get created when a dataset is processed using by group processing, the value in FIRST.VAR will be 1 if that is the first record in that by group and LAST.VAR will be 1 if that is last observation in that by group
ods listing gpath="C:\ODSgraphs";
ods html gpath="C:\ODSgraphs\images";
https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=grstatug&docsetTarget=p0cmojsdkiab8cn1k2qk3rp3a8s0.htm&locale=en
List input: Each field in the raw data is separated by atleast a single space and contains no embedded space
Column input: Data is present in fixed columns
Formatted input: Enables the user to supply some special instructions in the input statement for reading data, like reading numeric data with special symbols(decimals, comma separated values, dates). The instructions are called informats.
1) SUM statement can be used in the creation of xxSEQ variable while programming SDTM datasets
2) SUM statement can be used to count the number of subjects in each treatment group using a data step.
1) While working on tables programming, there can be cases where a particular level does not exist in one or more treatments. In such cases, we may want to set those treatment level column values to '0' or '0 (0.0)'. As we have to perform the same operation (of replacing the blank value with '0') over multiple variables, we can make use of array to achieve the result.
data final;
set trans;
array trts[*] trt1 trt2 trt3 trt4 trt99;
do i=1 to dim(trts);
if missing(trts[i]) then trts[i]="0";
end;
run;


2) Sometimes, when performing validation we may be interested in comparing the values after removing all the spaces from the values. We can make use of arrays and compress function to remove spaces from all character variables and perform the comparison.

data final; set trans;
array vars[*] _character_;
do i=1 to dim(vars);
vars[i]=compress(vars[i]);
end;
Differences in do while and do until can be explained with below points:
1) Placement where condition provided is evaluated: do while loop evaluates condition at the top (beginning of the loop) where do until evaluates the condition at the bottom (end of the loop).
2) Number of executions: Irrespective of whether the condition provided resolves to true or false, do until executes at least once whereas do while may not execute at all
Program Data Vector Is a logical area in memory where SAS builds a data set, one observation at a time
_N_ Is an automatic variable in a data step, which counts the number of times the data step begins to iterate
_ERROR_ is an automatic variable in a data step, which signals the occurrence of an error caused by the data
Input statement is used to specify how SAS has to read the values from the input buffer, and assigns the values to variables in the program data vector (PDV)
SET: used for reading observations from an existing SAS dataset.
MERGE: used for joining two or more datasets horizontally
MODIFY: Works only on existing datasets (cannot create new dataset), data need not be sorted, can be used to replace, delete, or appends observations in an existing dataset.
UPDATE: used to replace the values of variables in one dataset with values from another dataset(master and transaction datasets), updatemode=missingcheck can be used to prevent overwriting with missing values, requires the data to be sorted
IF (THEN ELSE): used for conditional processing of data
SELECT (WHEN): used for conditional processing of data
ARRAY: used for grouping together a set of variables of same data type to perform common processing on all the listed variables.
WHERE: used for subsetting required records from input dataset
List input modifiers:

Colon modifier: used to read character data with length greater then 8 characters/ read numeric data that contains special characters - allows the use of informats. Colon has to be placed in between the variable name and the informat (input item : $12.)
Ampersand modifier: Used to read character data which contains embedded single space. Ampersand has to be placed in between the variable name and informat (input name & $18.). Reads the data until a delimiter is encoutered (multiple spaces when the delimter is a space)
Line hold specifiers:

@: Hold a record in input buffer so that you can read from it again(used when multiple input statments are required)
@@: Used for reading multiple observations from a single record
Line pointer controls:

/: moves the pointer to the next line in the input buffer
#n: moves the pointer to nth line in the input buffer(Only this style of input enables creation of multiline input buffer)
Column pointer controls:

@n: absolute pointer control
+n: relative pointer control
Each field in raw data is separated by a space of delimiter
Each field has to be read in the same order of appearance in raw data
Missing values are to be represented with a placeholder (.)
Character values cannot contain embedded blanks
The default length of character variable is 8 characters and leads in truncation of data
Non-standard numeric data cannot be read- that is, informats cannot be used
Embedded spaces can be present in values for a filed
Skipping of some data fields is possible, and reading order need not be in the same order as in raw data
Length of a character variable is determined by the number of columns you mention in the input statement
A placeholder for missing values is not required
Used to read non-standard numeric data, and also character data with length more than 8 characters
First option reads the data (character) present in 25th column, while the second option reads 25 columns from the existing pointer location
This statement reads the data from column 1 to column 16 and the pointer rests at column 17
The length of the newly created variable will be the sum of the lengths of individual variable plus any specific text strings used in concatenation
Initializes the variable with 0
Retains the value across observations
Ignores missing values in the expression
Retaining a variable which already exists in the input dataset does not work. That is, variables that are read with a set, merge or update statement are retained automatically; naming them in a retain statement has no effect.
Appending and interleaving are used to combine SAS datasets vertically.
Appending copies all observations of first dataset and then copies all observations of second dataset and writes to a new dataset.
Interleaving intersperses the observations from two or more input data sets based on the value of one or more common variables, in a new data set
An array is a temporary grouping of SAS variables that are identified by an array-name.
NOBS=, to create a temporary variable to hold the number of observations in the dataset
END=, to create a temporary variable to identify the last observation being processed: this temporary variable's value will be 1 when the last record is processed and 0 when rest of the records are processed
INDSNAME=, to create a temporary variable to hold the name of the input dataset in the form of libname.memname. That is, if we are reading observations from sashelp.class library, the temporary variable specified in INDSNAME= option will hold a value of "SASHELP.CLASS" for all records coming from sashelp.class dataset.
When an arithmetic operation is performed on a missing value, SAS sets the result of that operation to missing and prints notes in the log to notify us which arithmetic expressions have missing values and when they were created

Examples
While deriving study days, subtracting reference start date from event/collection date when at least one of them has a missing value.
While creating total score by adding individual scores using '+' operator when at least one of the component scores is missing.
SAS prints a note to log about a variable being uninitialized in two cases. 1) When a new variable is defined (using length, attrib, format, informat or retain statments) but not assigned any value in the data step 2) when a variable is referenced that is neither present in the input dataset nor created in the data step prior to referencing it.

Examples
A new variable has been defined in the length statement but not assigned any value to it in subsequent statements of the data step.
When a character variable is used in an arithmetic expression SAS automatically converts character values to numeric values and prints a note in the log. If the character variable contains nonnumeric information, then a missing value is assigned and sets the _ERROR_ automatic variable to 1 to indicate data error.

Examples
Subtracting,dividing,multiplying or adding a numeric value from(to) a character variable
Usage of any numeric function like MAX,MIN on a character variable.
When a numeric variable is used an expression which expects character values, SAS automatically converts the numeric values to character values (using best 12.) format and prints a note in the log.

Examples
Concatenation of a numeric variable using concatenation operator (pipe symbols) to a character variable or string results in automatic conversion of numeric values to character values (using best12.) format.
Usage of any character functions like STRIP, LENGTH on numeric variable will also result in the conversion note.
By default, SAS merge statement handles well only one to one match merging and one to many match merging. That is, there cannot be multiple records for a by group combination in more than one dataset when merging. If there exists more than one dataset with multiple records for a by group SAS prints a note in the log indicating that more than one dataset contains repeats of BY values.

Examples
Merging a subject level dataset in which each record is duplicated to account for total column counts to another dataset like adverse events using subject identifier alone as by variable.
Invalid data' note appears in the log file when reading raw data of character type into a numeric variable.
data test;
  input a b;
cards;
john  1
megan 2
;
run;
Invalid numeric data' message appears in the log file when non-numeric values are used in a numeric arithmetic expression.
data class;
   set sashelp.class;
   x=name*10;
run;
When a format is used in a format statement or put function and the provided overall width of the specified format is not sufficient to display/store the number, SAS presents the number in a suitable notation to accommodate maximum precision and prints a note in the log stating at least one w.d format was too small.

Examples
Applying a format of 3. while converting the count to character format using put function when count is greater than 100
When by group processing is invoked in a data step or proc step, SAS expects the input data to be presorted based on the variables used in the step. If the input data is not sorted based on the by variables specified then SAS returns an error in the log window stating 'by variables are not properly sorted'
When sorting a dataset if a variable is specified more than once in the by statement, SAS prints a note in the log indicating that one or more variables have been specified more than once in the by statement.

Examples
BY statement' is a required statement in proc sort and when by statement is not used or when by variables are not specified, SAS prints an error in the log

Examples
When transposing a dataset with an ID statement, the values present in the ID variable(s) become the names of the newly created columns. As the name of the newly created variable cannot be blank, SAS ignores (does not transpose) the observations with missing ID values and prints a warning indicating how many such observations were excluded.

Examples
When transposing a dataset and an ID statement is used, if more than one record exists with same value in the variable(s) specified in ID statement, SAS prints an error in the SAS log. When transposing, SAS uses the values present in the ID variable as names of the newly created columns, as there cannot be more than one variable in a dataset with same name SAS issues this error.

Examples
Same as above, but when by statement is not used.

Examples
When 'by group processing' is required (or used), SAS expects the input datasets to be sorted with the same by variables as required in by group processing. If the observations are not sorted SAS prints and ERROR to the log indicating that the observations are not sorted in the input dataset.

Examples
When appending two or more dataset using SET statement, SAS fetches the length attribute for a variable from the first instance it encounters the variable in any of the datasets from left to right order. If the length of the same named variable in any of the datasets is greater after the first instance then SAS prints a warning to the SAS log stating multiple lengths were specified for that variable.

Examples
When interleaving or merging two or more dataset using SET/MERGE statement along with BY statement, SAS fetches the length attribute for a variable from the first instance it encounters the variable in any of the datasets from left to right order. If the length of the same named variable in any of the datasets is greater after the first instance then SAS prints a warning to the SAS log stating multiple lengths were specified for that variable.

Examples
When merging two or more datasets, if any of the datasets have common variables other than those specified as the by variables, the values of those variables from the right most dataset overwrites the values from previous datasets. The message in the LOG informing the user when variables are being overwritten is only printed when OPTION MSGLEVEL=I is used.

Examples
SAS cannot perform the mathematical operation of dividing a number with 0. So, when it counters a division with 0, it generates a note saying 'Division by zero detected at line XXX column XX'
When a variable that is neither present in the input dataset nor created in the step is specified on the DROP, KEEP or RENAME option or statement, SAS prints a warning to the SAS log indicating that the variable has never been referenced in that step.

Examples
Renaming a variable using rename= option on a dataset specified in set statement without keeping the variable in the list of variables to be read using keep= option.
Trim function returns a single blank value when it is applied on a variable with a null value on that observation.
Ascii: blank,digits,uppercase letters,underscore,lowercase letters,/br> EBCDIC: blank,lowercase letters,uppercase letters,digits
Suppresses printing the display of observation numbers in the output
Var statement: specify the list of variables
Noobs: suppress obs column
Obs=?label for obs column?
Id statement: Key variables used for identifying the observation
Sum statement: provides a list of variables for which sum is to displayed at the end.
n=: reports the number of contributing observations in a by group
Sumby:
Pageby:
Label option: to print labels instead of variable names (default is variable names)
Double option: Used to double the space between two observations
Uniform option: columns of data line up from one page to next, uses widest data value for column width
split=: used for splitting the observation into different lines based on specified character
The default statistics produced in output window are: Frequency, percent, cumulative frequency, cumulative percentage.
The default statistics produced in output dataset are count and percentage only.
The default statistics produced in output window are: Frequency, percent, Row Percent and Column Percent.
The default statistics produced in output dataset are count and percentage only.
Proc freq statement
data=: used to specify the name of the input dataset to be used by the procedure
order=: to specify the order of appearance of rows in frequency table.
order=data sorts the rows of the frequency table in the same order as they appear in the dataset.
order=freq sorts the rows of the frequency table from most frequent to least frequent.
noprint: suppresses the printing of frequency table in output window
tables statement
out=: to specify the name of the output dataset for the frequency table
missing: to instruct the procedure to conisder missing values as a level in the frequency table
The 5 default descriptive statistics that are produced by proc means are:
n, mean, standard deviation, minimum and maximum
NWAY option is used for specifying that the output dataset should contain statistics only for highest level of interaction of the variables specified in class statement.
COMPLETETYPES is used for requesting SAS to create all possible combinations of class variable values even if they do not exist in data.
AUTONAME option on output statement of proc means is used for requesting SAS to create a unique variable name for an output statistic when an explicit name is not assigned by the user.
PRELOADFMT in combination with COMPLETETYPES and FORMAT statement (for variables specified in class statement) is used for requesting statistics for all levels present in the format even if those levels are absent in data.
We can fetch the statistics for multiple variables at a time by specifying the list of required variables on var statement
proc summary data=sashelp.class;
   var age height weight;
run;
Proc report statement
data=: used to specify the input data
nowindows: used to request SAS to use nonwindowing environment
Missing: used to request SAS to consider missing values as valid levels for variables defined as group or order (without missing option SAS ignores the rows with missing values and does not print in the report)
LS=: used to specify the length of a line of the report
PS=: used to specify the number of lines in a page of the report
SPACING=: used to specify the number of blank characters between two columns of the report (default value is 2 spaces)
headline: used to request SAS to present a solid line between column headers and data portion in the report
headskip: used to request SAS to present a blank line between column headers and data portion in the report
SPLIT=: used to specify the split character (SAS introduces a line break whenever it encounters the specified split character in the variable values or column headers
CENTER: used to specify SAS to center align the report on the page (NOCENTER option left justifies the report)

Column statement
While specifying ranges when defining a user-defined format, the keyword 'OTHER' is used to represent all the values other than those explicitly defined ranges (including missing values).
While specifying ranges when defining a user-defined format, the keyword 'LOW' is used to represent the lowest non-missing value.
While specifying ranges when defining a user-defined format, the keyword 'HIGH' is used to represent the largest non-missing value.
While defining a user-defined format, ranges are specified in the format of {lower}-{higher}. '<' sign can be used to instruct SAS to exclude the specified {lower} value or {higher} value from the range.
1) {lower}-{higher} : Range includes both endpoints.
2) {lower}<-{higher}: Range excludes lower endpoint but includes higher endpoint
3) {lower}-<{higher}: Range excludes higher endpoint but includes lower endpoint
4) {lower}<-<{higher}: Range excludes both lower and higher endpoints
By default, formats get stored in a catalog named 'formats' of work library.
By default, formats get stored in work.formats catalog. We can make use of LIBRARY= option on proc format statement to specify a different library (or catalog) to store the formats.
proc format library=lib1;
run;

proc format library=lib1.myfmts;
run;
Formats are stored in catalogs, and the default catalog name for formats is 'formats.sas7bcat'. By default, SAS searches for two catalogs: work.formats and library.formats.
The sas system option FMTSEARCH can be used to search for formats stored in other catalogs.
The below option tells SAS to search the catalogs lib1.formats, lib2.myfmts, ghi.formats in addition to the default work.formats and library.formats
options fmtsearch=(lib1 lib2.myfmts ghi);
Formats (and informats) can be created an existing dataset using the CNTLIN= option of proc format statement.
proc format cntlin=myformatsdata;
run;
While creating a format, at a minimum, we specify if we are creating a format an or an informat, a name for the format/informat, start range and the associated label for the range. We need a variable for each of the above information.
So, the required variables are TYPE(Format or informat, numeric or character), FMTNAME (Name of the format or informat), START (range start), LABEL (display value associated with the range specified in start variable)
We can export formats to a sas dataset using CNTLOUT= option on the proc format statement.
proc format cntlout=formatsdata;
run;
When a user-defined format is applied to a variable and SAS cannot find the format,by default SAS fails to read the dataset. NOFMTERR is used for instructing SAS to replace the format with $w. or w. so that we can continue to access the dataset.
Format length for a numeric format can be upto 32 characters.
Format length for a character variable can be upto 31 characters (as SAS stores a character format name with a leading dollar sign)
When storing formats and informats SAS differentiates an informat from a format by adding a leading '@' sign.
. So, maximum length for a numeric informat can be upto 31 characters and maximum length for a character informat can be upto 30 characters (as SAS prefixes a dollar sign before the name of character informat)
Using proc format, we create both formats and informats of either numeric or character type. Based on these combinations, there are 4 possible values the 'TYPE' variable can take.
N - for numeric format
C- for character format
I - for character informat
J - for numeric informat.
Proc format returns an error when a range is repeated or when the values overlap.
A multilabel format is format that enables us to assign multiple labels to a value or a range of values.
A multilabel format is created by specifying the 'MULTILABEL' option in the VALUE statement of proc format.
proc format;
    
value agef (multilabel)
    
11='11'
    
12='12'
    
13='13'
    
11-13='11-13';
run;
By default the length of the format and the variable created using the format (using put or input function) will be equal to the length of the longest value specified in the range. Instead of relying on the longest value, DEFAULT= option can be used to specify the length of the format or informat.
We can view (print) the formats using the fmtlib option on proc format statement.
We can use contents statement of proc datasets to fetch the descriptor portion of a dataset.

proc datasets nolist library=sashelp;
    contents data=class;
    run;
quit;
Prevents the procedure to list all the other datasets which are there in the same library
proc datasets lib=work nolist;
    
modify dm;
    attrib _all_ label='';
    
attrib _all_ format=;
    attrib _all_ informat=;
run;
quit;
data class;
    set sashelp.class;
run;

proc contents data=class out=cont01 noprint;
run;

data cont02;
    set cont01(keep=name) end=last;
    where name ne "Name";
    
length renamelist $32767;
    
retain renamelist;
    name2=cats("old_",name);
    name3=cats(name,"=",name2);
    renamelist=catx(" ",renamelist,name3);
    if last;
    call symputx("renamevars",renamelist);
run;

data class_renamed;
    set class(rename=(&renamevars.));
run;
We have an option named LAST= for SET statement. It can be used to create a temporary variable and it's value will be 1 when the data step processes the last record.

data lastobs;
    set sashelp.class end=last;
    if last=1 then output;
run;
We have an option named NOBS= for SET statement. It can be used to create a temporary variable which holds the number of observations in the input dataset. We can make use this temporary variable and _N_ to subset the second record from the last

data seclastobs;
set sashelp.class nobs=num;
if _N_=num-1 then output;
SCAN function can be used to extract words from a string. By default, SCAN function searches for words from left to right. When we give a negative number SCAN functions searches for words from right to left of the string. So, If we give -1 as the second argument to the SCAN, we will get the last word.
lastword=scan(sentence,-1)
SUBSTR function can be used to extract substrings from a string. SUBSTR function can only read characters from left to right and does not take negative values (to read from right to left) like in SCAN function. We can make use of LENGTH function to get the length of the string and substract from the length and use that value as second argument in SUBSTR function.
lasttwochars=substr(sentence,length(sentence)-1)
SAS by default reads an input dataset from the first observation. FIRSTOBS= can be used to specify the observation number from which we want SAS to start reading the observations.
SAS by default reads an input dataset till the last observation. OBS= can be used to specify the observation number at which we want SAS to stop reading the observations.
We can use into clause of proc sql to create a macro variable.
proc sql;
    
select count(*) into :numobs 
    
from sashelp.class
    ;
quit;

proc sql;
   
select min(age) into :minage
   
from sashelp.class;
quit;
We can specify the list of expressions separated by commas, followed by list of macro variables separated by commas to create multiple macro variables simutaneously.
proc sql;
   
select min(age),max(age) into :minage, :maxage
   
from sashelp.class;
quit;
A series of macro variables can be created by specifying the named range of macro variables
proc sql;
   
select distinct age into :age1-:age6
   
from sashelp.class;
quit;


Below example creates a list of 6 variables to store the number of records in first 6 age groups
proc sql;
    
select count(*) into :age1-:age6
    
from sashelp.class
    
group by age;
quit;
We can use 'separated by' option of into clause to create a macro variable to hold a list of values separated by a delimiter.
proc sql;
   
select distinct name into:names separated by " "
   
from sashelp.class;
quit;
When creating macro variables in SAS using proc sql into clause, leading spaces are added to the macro variable values. We can use trimmed option to remove those leading spaces from the macro variables.
proc sql;
select avg(height) into :avg trimmed from sashelp.class;
quit
When we want multiple values of a single variable to be stored in a single macro variable we can make use of 'separated by' option along with into clause.
proc sql;
select name into :names separated by ' ' from sashelp.class;
quit
When a single macro variable with an explicit name is needed, we need to specify the name of the macro variable in quotes as first argument. As an explicit value has to been assigned to the macro variable, we need to specify the value in quotes as second argument in call symputx
data _null_;
   
call symputx("site","mycsg.in");
run;
When a single macro variable with an explicit name is needed, we need to specify the name of the macro variable in quotes as first argument. As the value from a variable has to been assigned to the macro variable, we need to specify the name of the data step variable as second argument (without quotes) in call symputx
data _null_;
   
set subject_totals;
   call symputx("site",total);
run;
When a series of macro variables with names coming from the data step variables, we need to specify the name of the data step variable (without quotes) as first argument. As the value from a variable has to been assigned to the macro variable, we need to specify the name of the data step variable as second argument (without quotes) in call symputx
data _null_;
   
set sashelp.class;
   call symputx(name,age);
run;
In order to hold a series of values from a data step variable into a single macro variable, we need to employ techniques to concatenate all values into a single variable and create the macro variable conditionally on the last record
data _null_;
   
set sashelp.class end=last;
   length names $1000;
   
retain names;
   names=catx(" ",names,name);
   if last then call symputx("names",names);
run;
We can use nobs option on set statement to create a temporary variable, this variable stores the number of observations in the input dataset. This variable can then be used on call symputx to create the macro variable.
data _null_;
   
set sashelp.class nobs=numobs;
   call symputx("classobs",numobs);
run;
When creating macro variables using %let statement, it ignores all the leading and trailing spaces.
When creating macro variables using %let statement, it ignores all the leading and trailing spaces.