mycsg QnA

What are the rules for SAS statements?

All SAS statements (except those containing data) must end with a semicolon (;).
SAS statements typically begin with a SAS keyword

SAS programs can be freely formatted:
Any number of SAS statements can appear on a single line provided they are separated by a semicolon.
A SAS statement can be continued from one line to the next as long as no word is split.
SAS statements can begin in any column.
SAS statements are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two.
The words in SAS statements are separated by blanks or special characters (e.g. =, +, or *).

What are the rules for SAS names?

All names must contain between 1 and 32 characters.
The first character appearing in a name must be a letter (A, B, ...Z, a, b, ... z) or an underscore (_).
Subsequent characters must be letters, numbers, or underscores. That is, no other characters, such as $, %, or & are permitted.
Blanks also cannot appear in SAS names. SAS names are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (SAS is only case sensitive within quotation marks.)

What do you understand by the term 'SAS dataset'?

A SAS data set is a SAS file that contains data values that are organized as a table of observations (rows) and variables (columns) that can be processed by SAS software. A SAS data set also contains descriptor information such as the data types and lengths of the variables

What do you mean by data?

Data is unprocessed facts and figures without any added interpretation or analysis. eg: "The price of crude oil is $80 per barrel."

What are the available windows in SAS display manager?

Editor window
Log window
Output window
Explorer window
Results window.

What is enhanced editor?

Enhanced editor window allows us to perform standard editing like entering, editing and submitting programs

What is the purpose of log window?

The Log Window displays messages about our SAS session and any programs that we submit
SAS uses the following color-coded system to assist you in reading the log:
The DATA and PROC steps that appear in your program are printed in black.
Notes that SAS wants to report to you are printed in blue
Warnings that SAS wants to draw to your attention are printed in green
Errors that cause SAS to abort running your program are printed in red

What is the purpose of output window?

Output window enables us to view the LISTING output from our SAS programs.

What is the purpose of explorer window?

The Explorer Window allows to easily view and manage our SAS files, which are stored in SAS data libraries.

What is the purpose of results window?

The Results window serves as an index in a tree structure to the various types of output generated by the submitted SAS programs in the session.

What do you mean by a temporary dataset?

A SAS dataset that exists only during the current SAS session is called a temporary dataset. All datasets stored in work library are temporary datasets.

What do you mean by a permanent dataset?

A SAS dataset that is stored in a location on the computer and exist after exiting SAS session is called a permanent dataset. Generally, any dataset that is stored in any library other than work library is considered a permanent dataset.

What do you mean by standard numeric data?

A data value that contains only numbers, decimal point or minus sign is called standard numeric data.

What do you mean by non-standard numeric data?

A data value that represents a number but contain additional symbols like comma, dollar sign, or blanks is called a non-standard numeric data. Date, time and date time values are also considered non-standard numeric data in SAS.

Give me some examples of non-standard numeric data

A number within an embedded comma in it: 2,341
A number with an embedded dollar sign: $100
A date value: 1Mar1990
A time value: 10:23

What do you mean by 'comment' in a SAS program?

A comment is any text that is used to document the purpose of the program, explain unusual segments of the program, or describe steps in a complex program or calculation. SAS ignores text in comment statements during processing.

What are the available commenting styles in SAS?

2 types
1) /*comment*/
2)*comment;

What is input statement?

Input statement is used for describing the arrangement of values in the input data record and assigns input values to the corresponding SAS variables.

What is infile statment?

Infile statment is used for specifying an external file to read with an INPUT statement.

What is list input and when is it used?

List input is one of the methods for reading raw data into SAS. In this method, SAS scans the input data record for input values and assigns them to the corresponding SAS variables. We use this method when the data values are separated by atleast one delimiter (data not arranged in specific columns).

What are the limitattions of list input?

Fields must be separated by at least one blank (or other delimiter).
Fields must be read in order from left to right.
You cannot skip or re-read fields.
Missing values must be represented by a place holder such as a period. (A blank field causes the matching of variable names and values to get out of sync.)
Character values can't contain embedded blanks.
The default length of character values is 8 bytes. A longer value is truncated when it is written to the data set. (NOTE: 1 byte = 1 character).
Data must be in standard character or numeric format.

What is column input and when is it used?

Column input is used for readin data values that are entered in fixed columns

What is missover option and when is it used?

By default, when reading raw data using list input, if there are less fields on an input data record than the number of variables specified on the input statement, it tries to fetch the next record to read values for the remaining variables. MISSOVER option in the INFILE statement sets the remaining INPUT statement variables to missing values instead of allowing SAS to fetch new input record.

What is flowover option and when is it used?

When reading raw data, if there are less fields on an input data record than the number of variables specified on the input statement, Flowover option on infile statement tells SAS to fetch the next record to read values for the remaining variables.

Describe this input statement:
input name $10. #2 id 3-4;

This input statement tells SAS to read a character variable named 'name' on the first of the two records from input corresponding to an observation and #2 tells SAS to move the pointer to the second record to read the value for ID from columns 3 and 4:

Describe this input statement:
input name age / id 3-4;

This input statment tells SAS to read the values for NAME and AGE from the first input record before the pointer moves to the second record to read the value of ID from columns 3 and 4

What are the two format modifiers (of error reporting)?

There are two format modifiers, '?' and '??'

What does '?' format modifier do?

The '?' format modifier suppresses printing the invalid data note when SAS encounters invalid data values.

What does '??' format modifier do?

The '?' format modifier suppresses printing the invalid data note when SAS encounters invalid data values.
It also prevents the automatic variable _ERROR_ to be set to 1 for invalid observation.

What is 'mixed' input style?

If an input statement contains more than one type of input styles(list, column, formatted) then we call it as mixed input style.

Describe the below input statement:
input idno name $18. team $ 25-30 startwght endwght;

This input statement tells SAS to read the values of IDNO, STARTWGHT, and ENDWGHT with list input, to read the value of NAME with formatted input, and to read the value of TEAM with column input.

What does DSD option do on infile statment?

DSD (Delimiter-Sensitive Data) is an option in infile statement of SAS. It does three things for us.
1: it ignores delimiters in data values enclosed in quotation marks
2: it ignores quotation marks as part of your data
3: it treats two consecutive delimiters in a row as missing value

What is input buffer?

Input buffer is a temporary area of memory into which each record of data is read when the INPUT statement executes.

What is Program Data Vector (PDV)?

Program Data Vector is a temporary logical area of computer memory in which SAS builds a SAS data set, one observation at a time.

What are the two automatic variables in PDV?

_N_ and _ERROR are the two automatic variables are created automatically by the DATA step. These variables are added to the program data vector but are not output to the data set being created. The values of automatic variables are retained from one iteration of the DATA step to the next, rather than set to missing. Note that there are other automatic variables created by certain other data step statments, like 'in=' 'end=' 'first.variable', and 'last.variable'.

What is the purpose of _ERROR_ variable?

_ERROR_ is an automatic variable in SAS to keep track of an encountered error, such as an input data error, a conversion error, or a math error, as in division by 0. The default value of it is 0 and turns to 1 when there is an error.

What is the purpose of _N_ variable?

_N_ is an automatic variable in SAS created in data step and it increments by 1 each time the DATA step loops past the DATA statement. The value of _N_ represents the number of times the DATA step has iterated.

What is the default behavior of pointer control (where does the pointer stop after read a variable value) in list input, column input and formatted inputs?

With column and formatted input, the pointer reads the columns that are indicated in the INPUT statement and stops in the next column. With list input, however, the pointer scans data records to locate data values and reads a blank to indicate that a value has ended. After reading a value with list input, the pointer stops in the second column after the value.

What are column pointer controls?

*Column pointer controls indicate the column in which an input value starts. Use line pointer controls within the INPUT statement to move to the next input record or to define the number of input records per observation.

What happens when SAS encounters invalid data?

sets the value of the variable that is being read to missing or the value that is specified with the INVALIDDATA= system option. prints an invalid data note in the SAS log. prints the input line and column number that contains the invalid value in the SAS log. Unprintable characters appear in hexadecimal. To help determine column numbers, SAS prints a rule line above the input line. sets the automatic variable _ERROR_ to 1 for the current observation.

What are the different attributes of a SAS variable?

*1)length 4)label 2)name 5)format 3)type 6)informat. 4,5,6 attributes are optinal

What is lowest length you can give to a numeric variable in SAS?

The lowest length that can be given to a numeric variable in SAS is 3, with this length SAS can store values upto 8192 (a value corresponding to 2 to the power of 13)

What is the highest number that can be stored in SAS?

2 to the power of 53 (9,007,199,254,740,992)

What is the precision level of lowest and highest lengths for SAS variable?

3 significant digits stored with a length of 3 and 15 significant digits stored without losing precision.

What is an informat in SAS?

An informat is an instruction that SAS uses to read data values into a variable. For example, the following value represents a numberic value but contains a dollar sign and commas: $1,000,000 An informat named COMMA11. is used to remove the dollar sign ($) and commas (,) before storing the numeric value 1000000 in a variable.

What is a format in SAS?

A format is an instruction that SAS uses to write data values. We use formats to control the written appearance of data values. For example, the WORDS22. format, which converts numeric values to their equivalent in words, writes the numeric value 692 as six hundred ninety-two.

What is SAS?

SAS is an integrated system of software solutions which can be used in data management, statistical and mathematical analysis, report writing, forecasting and other activities.

What is the default behavior of an input statement (when multiple input statements are present) in terms of reading a record into input buffer?

Normally, each INPUT statement in a DATA step reads a new data record into the input buffer.

What do we mean by descriptor portion of a SAS dataset?

Descriptor portion of a SAS consists of both Dataset attributes and Variable attributes
Dataset attributes
Name, Number of variables, Number of observations, Creation and Last modified date, Sort information, names and attributes of all variables, Indexes Engine and host level information, filename (with complete path)
Variable attributes
Name, type, length, label, format, informat, position, index type

Mention the different attributes of a SAS variable?

Name, type, length, label, format, informat, position

PDV reinitialization (reset to missing) exceptions

Variables read from a data set
Variables mentioned in retain statement
Variables created using sum statement
Variables initialized in a temporary array
_N_ and _ERROR_ are also not reset to missing (_N_ increments by 1 and _ERROR_ resets to 0)

What is YEARCUTOFF in SAS?

The YEARCUTOFF= system option is used to interpret dates with two-digit years.
Prior to SAS 9.4 the default yearcutoff value in SAS is 1920 and in version 9.4 it is 1926.
When yearcutoff is 1926, two-digit years of 26 through 99 are assigned a century prefix of "19" (i.e. 1926-1999), and two-digit years from 00 through 25 are assigned a century prefix of "20" (i.e. 2000-2025)

Filename and fileref for multiple files at a time

Filename fitclub 'path to the folder' Infile fitclub(club1); Infile fitclub(club2);

What happens when a variable is mentioned in both DROP and KEEP statements or options?

When you mention a variable in both DROP and KEEP statements, DROP takes precedence

Operator precedence

When operators of equal precedence appear, the operations are performed from left to right (except exponentiation, which is performed from right to left) 1**2**2 is 1, 3**2**2 is 81, 3**2**0 is 3 (2**0 is evaluated first).
1) Multiplication and division are of equal precedence and
2) Addition and subtraction are of equal precedence

Colon modifier in comparison
(if name=:?Al?)

Colon causes SAS to compare the same number of characters in the shorter value and the longer value. The example on the left identifies all the cases which start with ?Al?. We can also use colon modifier with other operators as well, like <=:?L?, >:?M? etc.

What does a weekday value of 1 represent in SAS?

SAS considers weekdays as 1 through 7 (from Sunday to Saturday). So, a weekday of 1 represents Sunday.

How does SAS store dates in numeric format?

SAS stores date values as number of days from 1Jan1960. SAS date value for 1Jan1960 is 0, 2Jan1960 is 1, 3Jan1960 is 3, 31Dec1959 is -1.

What is the precedence for attributes for variables from different datasets in a set statement?

If two or more data sets explicitly define different formats, informats, or labels for the same variable, then the variable in the new dataset assumes the attribute from the first dataset in the set statement that explicitly defines the attribute.

When do we use FORCE option in proc append?

Used when the master and transaction datasets are not the same in descriptor portion. Force will make SAS ignore the variables which are not in base dataset and sets the values to missing for the observations being read from transaction dataset.

Types of Errors

Syntax:
misspelled keyword, missing or invalid punctuation, invalid statement or dataset options
Execution-time:
dataset not present, illegal math operations, observations out of order in by group processing
Data:
invalid values
Semantic:
wrong number of arguments for a function, usage of unassigned libref etc

How does SAS store time values in numeric format?

SAS stores the time values in numeric format as number of seconds from 12:00 am. A numeric value of 3600 represents 1:00 am as there are 3600 seconds in an hour. Similary, a value of 7201 represents 2:00:01 am.

How does SAS store datetime values in numeric format?

SAS stores the datetime values in numeric format as number of seconds from 12:00 am or 1Jan1960.

Mention some of the automatic variables created in SAS ?

_N_, an automatic variable created in data step processing which keeps track of the number of times the data step has iterated
_ERROR_, is an automatic variable in a data step to keep track of data errors in the current observation being processed. The default value is 0, when there is at least data error, the variable's value gets set to 1
_INFILE_, is an automatic variable that gets created when reading raw data from an external file and is used to store the content of input buffer
END=, this option can be used on the SET statement to create a temporary variable to identify the processing of last observation in a dataset. The value in the temporary will be 1 when last observation is processed and it will be 0 when all other observations are processed
NOBS=, this option can be used to create a temporary variable to hold the value of the total number of observations in the input dataset being read
IN=, this dataset option can be used on either MERGE or SET statement to create a temporary variable to identify if the particular dataset has contributed to the observation being built in PDV, this temporary variable value will be 1 if the dataset contributed to the observation otherwise it's value will be 0
FIRST.VAR, LAST.VAR, these are the temporary variables which get created when a dataset is processed using by group processing, the value in FIRST.VAR will be 1 if that is the first record in that by group and LAST.VAR will be 1 if that is last observation in that by group

How do we change the path where the graphs images are saved with ODS GRAPHICS

ods listing gpath="C:\ODSgraphs";
ods html gpath="C:\ODSgraphs\images";
https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=grstatug&docsetTarget=p0cmojsdkiab8cn1k2qk3rp3a8s0.htm&locale=en

Mention the three main types of input statments in SAS

List input: Each field in the raw data is separated by atleast a single space and contains no embedded space
Column input: Data is present in fixed columns
Formatted input: Enables the user to supply some special instructions in the input statement for reading data, like reading numeric data with special symbols(decimals, comma separated values, dates). The instructions are called informats.

Provide at least two examples where SUM statement is used?

1) SUM statement can be used in the creation of xxSEQ variable while programming SDTM datasets
2) SUM statement can be used to count the number of subjects in each treatment group using a data step.

Mention an example scenario where you have used arrays?

1) While working on tables programming, there can be cases where a particular level does not exist in one or more treatments. In such cases, we may want to set those treatment level column values to '0' or '0 (0.0)'. As we have to perform the same operation (of replacing the blank value with '0') over multiple variables, we can make use of array to achieve the result.
data final;
set trans;
array trts[*] trt1 trt2 trt3 trt4 trt99;
do i=1 to dim(trts);
if missing(trts[i]) then trts[i]="0";
end;
run;

2) Sometimes, when performing validation we may be interested in comparing the values after removing all the spaces from the values. We can make use of arrays and compress function to remove spaces from all character variables and perform the comparison.

data final; set trans;
array vars[*] _character_;
do i=1 to dim(vars);
vars[i]=compress(vars[i]);
end;

Difference between do while and do until loop

Differences in do while and do until can be explained with below points:
1) Placement where condition provided is evaluated: do while loop evaluates condition at the top (beginning of the loop) where do until evaluates the condition at the bottom (end of the loop).
2) Number of executions: Irrespective of whether the condition provided resolves to true or false, do until executes at least once whereas do while may not execute at all

What is Program Data Vector (PDV)?

Program Data Vector Is a logical area in memory where SAS builds a data set, one observation at a time

What is _N_ in a SAS data step?

_N_ Is an automatic variable in a data step, which counts the number of times the data step begins to iterate

What is _ERROR_ in a SAS data step?

_ERROR_ is an automatic variable in a data step, which signals the occurrence of an error caused by the data

What is an input statement in SAS?

Input statement is used to specify how SAS has to read the values from the input buffer, and assigns the values to variables in the program data vector (PDV)

Mention the key statements used in a data step

SET: used for reading observations from an existing SAS dataset.
MERGE: used for joining two or more datasets horizontally
MODIFY: Works only on existing datasets (cannot create new dataset), data need not be sorted, can be used to replace, delete, or appends observations in an existing dataset.
UPDATE: used to replace the values of variables in one dataset with values from another dataset(master and transaction datasets), updatemode=missingcheck can be used to prevent overwriting with missing values, requires the data to be sorted
IF (THEN ELSE): used for conditional processing of data
SELECT (WHEN): used for conditional processing of data
ARRAY: used for grouping together a set of variables of same data type to perform common processing on all the listed variables.
WHERE: used for subsetting required records from input dataset

Key input statement modifiers

List input modifiers:

Colon modifier: used to read character data with length greater then 8 characters/ read numeric data that contains special characters - allows the use of informats. Colon has to be placed in between the variable name and the informat (input item : $12.)
Ampersand modifier: Used to read character data which contains embedded single space. Ampersand has to be placed in between the variable name and informat (input name & $18.). Reads the data until a delimiter is encoutered (multiple spaces when the delimter is a space)
Line hold specifiers:

@: Hold a record in input buffer so that you can read from it again(used when multiple input statments are required)
@@: Used for reading multiple observations from a single record
Line pointer controls:

/: moves the pointer to the next line in the input buffer
#n: moves the pointer to nth line in the input buffer(Only this style of input enables creation of multiline input buffer)
Column pointer controls:

@n: absolute pointer control
+n: relative pointer control

What is 'List input' in SAS?

Each field in raw data is separated by a space of delimiter
Each field has to be read in the same order of appearance in raw data
Missing values are to be represented with a placeholder (.)
Character values cannot contain embedded blanks
The default length of character variable is 8 characters and leads in truncation of data
Non-standard numeric data cannot be read- that is, informats cannot be used

What is 'Column input' in SAS?

Embedded spaces can be present in values for a filed
Skipping of some data fields is possible, and reading order need not be in the same order as in raw data
Length of a character variable is determined by the number of columns you mention in the input statement
A placeholder for missing values is not required

What is 'Formatted input' in SAS?

Used to read non-standard numeric data, and also character data with length more than 8 characters

What is the difference between 'Input $25' vs 'Input $25.' ?

First option reads the data (character) present in 25th column, while the second option reads 25 columns from the existing pointer location

Pointer location: Input item $1-16

This statement reads the data from column 1 to column 16 and the pointer rests at column 17

What will be the length of the character variable when pipe symbol is used for concatenation?

The length of the newly created variable will be the sum of the lengths of individual variable plus any specific text strings used in concatenation

What are the features of SUM statement in SAS?

Initializes the variable with 0
Retains the value across observations
Ignores missing values in the expression

What happens when we try to specify an already existing variable in retain statement?

Retaining a variable which already exists in the input dataset does not work. That is, variables that are read with a set, merge or update statement are retained automatically; naming them in a retain statement has no effect.

What is the difference between appending and interleaving in SAS?

Appending and interleaving are used to combine SAS datasets vertically.
Appending copies all observations of first dataset and then copies all observations of second dataset and writes to a new dataset.
Interleaving intersperses the observations from two or more input data sets based on the value of one or more common variables, in a new data set

What is an Array in SAS?

An array is a temporary grouping of SAS variables that are identified by an array-name.

Mention Some of the options used on SET statement

NOBS=, to create a temporary variable to hold the number of observations in the dataset
END=, to create a temporary variable to identify the last observation being processed: this temporary variable's value will be 1 when the last record is processed and 0 when rest of the records are processed
INDSNAME=, to create a temporary variable to hold the name of the input dataset in the form of libname.memname. That is, if we are reading observations from sashelp.class library, the temporary variable specified in INDSNAME= option will hold a value of "SASHELP.CLASS" for all records coming from sashelp.class dataset.

Explain the log Note ' Missing values were generated as a result of performing an operation on missing values'

When an arithmetic operation is performed on a missing value, SAS sets the result of that operation to missing and prints notes in the log to notify us which arithmetic expressions have missing values and when they were created

Examples
While deriving study days, subtracting reference start date from event/collection date when at least one of them has a missing value.
While creating total score by adding individual scores using '+' operator when at least one of the component scores is missing.

Explain the log Note ' Variable xxx is uninitialized'

SAS prints a note to log about a variable being uninitialized in two cases. 1) When a new variable is defined (using length, attrib, format, informat or retain statments) but not assigned any value in the data step 2) when a variable is referenced that is neither present in the input dataset nor created in the data step prior to referencing it.

Examples
A new variable has been defined in the length statement but not assigned any value to it in subsequent statements of the data step.

Explain the log Note ' Character values have been converted to numeric values at the places '

When a character variable is used in an arithmetic expression SAS automatically converts character values to numeric values and prints a note in the log. If the character variable contains nonnumeric information, then a missing value is assigned and sets the _ERROR_ automatic variable to 1 to indicate data error.

Examples
Subtracting,dividing,multiplying or adding a numeric value from(to) a character variable
Usage of any numeric function like MAX,MIN on a character variable.

Explain the log Note ' Numeric values have been converted to character values at the places'

When a numeric variable is used an expression which expects character values, SAS automatically converts the numeric values to character values (using best 12.) format and prints a note in the log.

Examples
Concatenation of a numeric variable using concatenation operator (pipe symbols) to a character variable or string results in automatic conversion of numeric values to character values (using best12.) format.
Usage of any character functions like STRIP, LENGTH on numeric variable will also result in the conversion note.

Explain the log Note ' MERGE statement has more than one data set with repeats of BY values'

By default, SAS merge statement handles well only one to one match merging and one to many match merging. That is, there cannot be multiple records for a by group combination in more than one dataset when merging. If there exists more than one dataset with multiple records for a by group SAS prints a note in the log indicating that more than one dataset contains repeats of BY values.

Examples
Merging a subject level dataset in which each record is duplicated to account for total column counts to another dataset like adverse events using subject identifier alone as by variable.

Explain the log message ' Invalid data'

Invalid data' note appears in the log file when reading raw data of character type into a numeric variable.

data test;

  input a b;

cards;

john  1
megan 2
;

run;

Explain the log message ' Invalid numeric data'

Invalid numeric data' message appears in the log file when non-numeric values are used in a numeric arithmetic expression.

data class;

   set sashelp.class;

   x=name*10;

run;

Explain the log NOTE ' At least one W.D format was too small for the number to be printed'

When a format is used in a format statement or put function and the provided overall width of the specified format is not sufficient to display/store the number, SAS presents the number in a suitable notation to accommodate maximum precision and prints a note in the log stating at least one w.d format was too small.

Examples
Applying a format of 3. while converting the count to character format using put function when count is greater than 100

Explain the log error ' by variables are not properly sorted'

When by group processing is invoked in a data step or proc step, SAS expects the input data to be presorted based on the variables used in the step. If the input data is not sorted based on the by variables specified then SAS returns an error in the log window stating 'by variables are not properly sorted'

Explain the log NOTE ' Duplicate BY variable(s) are specified'

When sorting a dataset if a variable is specified more than once in the by statement, SAS prints a note in the log indicating that one or more variables have been specified more than once in the by statement.

Examples

Explain the log ERROR ' No by statement used or no by variables specified'

BY statement' is a required statement in proc sort and when by statement is not used or when by variables are not specified, SAS prints an error in the log

Examples

Explain the log WARNING ' xx observations omitted due to missing ID values'

When transposing a dataset with an ID statement, the values present in the ID variable(s) become the names of the newly created columns. As the name of the newly created variable cannot be blank, SAS ignores (does not transpose) the observations with missing ID values and prints a warning indicating how many such observations were excluded.

Examples

Explain the log ERROR ' The ID value xxx occurs twice in the same BY group'

When transposing a dataset and an ID statement is used, if more than one record exists with same value in the variable(s) specified in ID statement, SAS prints an error in the SAS log. When transposing, SAS uses the values present in the ID variable as names of the newly created columns, as there cannot be more than one variable in a dataset with same name SAS issues this error.

Examples

Explain the log ERROR ' The ID value xxx occurs twice in the input dataset'

Same as above, but when by statement is not used.

Examples

Explain the log ERROR ' by variables are not properly sorted'

When 'by group processing' is required (or used), SAS expects the input datasets to be sorted with the same by variables as required in by group processing. If the observations are not sorted SAS prints and ERROR to the log indicating that the observations are not sorted in the input dataset.

Examples

Explain the log WARNING ' Multiple lengths were specified for the variable'

When appending two or more dataset using SET statement, SAS fetches the length attribute for a variable from the first instance it encounters the variable in any of the datasets from left to right order. If the length of the same named variable in any of the datasets is greater after the first instance then SAS prints a warning to the SAS log stating multiple lengths were specified for that variable.

Examples

Explain the log WARNING ' Multiple lengths were specified for the BY variable'

When interleaving or merging two or more dataset using SET/MERGE statement along with BY statement, SAS fetches the length attribute for a variable from the first instance it encounters the variable in any of the datasets from left to right order. If the length of the same named variable in any of the datasets is greater after the first instance then SAS prints a warning to the SAS log stating multiple lengths were specified for that variable.

Examples

Explain the log INFO ' The variable xxx on dataset work.xxx will be overwritten'

When merging two or more datasets, if any of the datasets have common variables other than those specified as the by variables, the values of those variables from the right most dataset overwrites the values from previous datasets. The message in the LOG informing the user when variables are being overwritten is only printed when OPTION MSGLEVEL=I is used.

Examples

Explain the log note ' Division by zero'

SAS cannot perform the mathematical operation of dividing a number with 0. So, when it counters a division with 0, it generates a note saying 'Division by zero detected at line XXX column XX'

Explain the log WARNING ' The variable xxx in the DROP, KEEP, or RENAME list has never been referenced'

When a variable that is neither present in the input dataset nor created in the step is specified on the DROP, KEEP or RENAME option or statement, SAS prints a warning to the SAS log indicating that the variable has never been referenced in that step.

Examples
Renaming a variable using rename= option on a dataset specified in set statement without keeping the variable in the list of variables to be read using keep= option.

What will be the returned values when TRIM function is applied on a null value?

Trim function returns a single blank value when it is applied on a variable with a null value on that observation.

What is the Sort sequence of character values?

Ascii: blank,digits,uppercase letters,underscore,lowercase letters,/br> EBCDIC: blank,lowercase letters,uppercase letters,digits

NOOBS in Proc Print

Suppresses printing the display of observation numbers in the output

Mention some of the important statements and options available in proc print

Var statement: specify the list of variables
Noobs: suppress obs column
Obs=?label for obs column?
Id statement: Key variables used for identifying the observation
Sum statement: provides a list of variables for which sum is to displayed at the end.
n=: reports the number of contributing observations in a by group
Sumby:
Pageby:
Label option: to print labels instead of variable names (default is variable names)
Double option: Used to double the space between two observations
Uniform option: columns of data line up from one page to next, uses widest data value for column width
split=: used for splitting the observation into different lines based on specified character

What are the default statistics that are produced when we run proc freq for a one-way table?

The default statistics produced in output window are: Frequency, percent, cumulative frequency, cumulative percentage.
The default statistics produced in output dataset are count and percentage only.

What are the default statistics that are produced when we run proc freq for a two-way table?

The default statistics produced in output window are: Frequency, percent, Row Percent and Column Percent.
The default statistics produced in output dataset are count and percentage only.

Mention some important statements and options used in proc freq

Proc freq statement
data=: used to specify the name of the input dataset to be used by the procedure
order=: to specify the order of appearance of rows in frequency table.
order=data sorts the rows of the frequency table in the same order as they appear in the dataset.
order=freq sorts the rows of the frequency table from most frequent to least frequent.
noprint: suppresses the printing of frequency table in output window
tables statement
out=: to specify the name of the output dataset for the frequency table
missing: to instruct the procedure to conisder missing values as a level in the frequency table

What are the default statistics that are produced when we run proc means?

The 5 default descriptive statistics that are produced by proc means are:
n, mean, standard deviation, minimum and maximum

What does nway option on proc means statement do?

NWAY option is used for specifying that the output dataset should contain statistics only for highest level of interaction of the variables specified in class statement.

What does completetypes option on proc means statement do?

COMPLETETYPES is used for requesting SAS to create all possible combinations of class variable values even if they do not exist in data.

What does autoname option on output statement of proc means do?

AUTONAME option on output statement of proc means is used for requesting SAS to create a unique variable name for an output statistic when an explicit name is not assigned by the user.

What does preloadfmt option on class statement of proc means do?

PRELOADFMT in combination with COMPLETETYPES and FORMAT statement (for variables specified in class statement) is used for requesting statistics for all levels present in the format even if those levels are absent in data.

How do we fetch the statistics for more than one variable using proc means?

We can fetch the statistics for multiple variables at a time by specifying the list of required variables on var statement

proc summary data=sashelp.class;

   var age height weight;

run;

Mention some important statements and options used in proc report

Proc report statement
data=: used to specify the input data
nowindows: used to request SAS to use nonwindowing environment
Missing: used to request SAS to consider missing values as valid levels for variables defined as group or order (without missing option SAS ignores the rows with missing values and does not print in the report)
LS=: used to specify the length of a line of the report
PS=: used to specify the number of lines in a page of the report
SPACING=: used to specify the number of blank characters between two columns of the report (default value is 2 spaces)
headline: used to request SAS to present a solid line between column headers and data portion in the report
headskip: used to request SAS to present a blank line between column headers and data portion in the report
SPLIT=: used to specify the split character (SAS introduces a line break whenever it encounters the specified split character in the variable values or column headers
CENTER: used to specify SAS to center align the report on the page (NOCENTER option left justifies the report)

Column statement

What is the use of 'OTHER' keyword in proc format?

While specifying ranges when defining a user-defined format, the keyword 'OTHER' is used to represent all the values other than those explicitly defined ranges (including missing values).

What is the use of 'LOW' keyword in proc format?

While specifying ranges when defining a user-defined format, the keyword 'LOW' is used to represent the lowest non-missing value.

What is the use of 'HIGH' keyword in proc format?

While specifying ranges when defining a user-defined format, the keyword 'HIGH' is used to represent the largest non-missing value.

How do we use '<' sign in proc format while specifying ranges?

While defining a user-defined format, ranges are specified in the format of {lower}-{higher}. '<' sign can be used to instruct SAS to exclude the specified {lower} value or {higher} value from the range.
1) {lower}-{higher} : Range includes both endpoints.
2) {lower}<-{higher}: Range excludes lower endpoint but includes higher endpoint
3) {lower}-<{higher}: Range excludes higher endpoint but includes lower endpoint
4) {lower}<-<{higher}: Range excludes both lower and higher endpoints

Where do user-defined formats get saved by default?

By default, formats get stored in a catalog named 'formats' of work library.

How do you store formats in a permanent library?

By default, formats get stored in work.formats catalog. We can make use of LIBRARY= option on proc format statement to specify a different library (or catalog) to store the formats.

proc format library=lib1;

run;

proc format library=lib1.myfmts;

run;

How do you access formats stored in a permanent library?

Formats are stored in catalogs, and the default catalog name for formats is 'formats.sas7bcat'. By default, SAS searches for two catalogs: work.formats and library.formats.
The sas system option FMTSEARCH can be used to search for formats stored in other catalogs.
The below option tells SAS to search the catalogs lib1.formats, lib2.myfmts, ghi.formats in addition to the default work.formats and library.formats

options fmtsearch=(lib1 lib2.myfmts ghi);

How do you create formats from a sas dataset?

Formats (and informats) can be created an existing dataset using the CNTLIN= option of proc format statement.

proc format cntlin=myformatsdata;

run;

What are the required variables to be present in the input data when creating formats from a sas dataset?

While creating a format, at a minimum, we specify if we are creating a format an or an informat, a name for the format/informat, start range and the associated label for the range. We need a variable for each of the above information.
So, the required variables are TYPE(Format or informat, numeric or character), FMTNAME (Name of the format or informat), START (range start), LABEL (display value associated with the range specified in start variable)

How do you export formats to a sas dataset?

We can export formats to a sas dataset using CNTLOUT= option on the proc format statement.

proc format cntlout=formatsdata;

run;

When do you use NOFMTERR option in sas?

When a user-defined format is applied to a variable and SAS cannot find the format,by default SAS fails to read the dataset. NOFMTERR is used for instructing SAS to replace the format with $w. or w. so that we can continue to access the dataset.

What can be the maximum length of a format name?

Format length for a numeric format can be upto 32 characters.
Format length for a character variable can be upto 31 characters (as SAS stores a character format name with a leading dollar sign)

What can be the maximum length of an informat name?

When storing formats and informats SAS differentiates an informat from a format by adding a leading '@' sign.
. So, maximum length for a numeric informat can be upto 31 characters and maximum length for a character informat can be upto 30 characters (as SAS prefixes a dollar sign before the name of character informat)

What are the possible values for TYPE variable in the input dataset used to create formats?

Using proc format, we create both formats and informats of either numeric or character type. Based on these combinations, there are 4 possible values the 'TYPE' variable can take.
N - for numeric format
C- for character format
I - for character informat
J - for numeric informat.

What happens when we provide the different labels for a same range or when the provided ranges overlap?

Proc format returns an error when a range is repeated or when the values overlap.

What is a multilabel format?

A multilabel format is format that enables us to assign multiple labels to a value or a range of values.

How do you create a multilabel format?

A multilabel format is created by specifying the 'MULTILABEL' option in the VALUE statement of proc format.

proc format;

    value agef (multilabel)
    11='11'
    12='12'
    13='13'
    11-13='11-13';

run;

Why do we use default= option on value or invalue statement of proc format?

By default the length of the format and the variable created using the format (using put or input function) will be equal to the length of the longest value specified in the range. Instead of relying on the longest value, DEFAULT= option can be used to specify the length of the format or informat.

How do you view (print) a saved format library?

We can view (print) the formats using the fmtlib option on proc format statement.

How do we fetch the descriptor portion of a SAS dataset using proc datasets?

We can use contents statement of proc datasets to fetch the descriptor portion of a dataset.

proc datasets nolist library=sashelp;

    contents data=class;

    run;

quit;

What does NOLIST option on Proc Datasets do?

Prevents the procedure to list all the other datasets which are there in the same library

How do you remove attributes(label, format, informat) of all variables in a dataset?

proc datasets lib=work nolist;

    modify dm;

    attrib _all_ label='';

    attrib _all_ format=;

    attrib _all_ informat=;

run;

quit;

How do you rename all variables of a dataset with a common prefix ?

data class;

    set sashelp.class;

run;

proc contents data=class out=cont01 noprint;

run;

data cont02;

    set cont01(keep=name) end=last;

    where name ne "Name";

    length renamelist $32767;

    retain renamelist;

    name2=cats("old_",name);

    name3=cats(name,"=",name2);

    renamelist=catx(" ",renamelist,name3);

    if last;

    call symputx("renamevars",renamelist);

run;

data class_renamed;

    set class(rename=(&renamevars.));

run;

How do you subset last record from a dataset?

We have an option named LAST= for SET statement. It can be used to create a temporary variable and it's value will be 1 when the data step processes the last record.

data lastobs;

    set sashelp.class end=last;

    if last=1 then output;

run;

How do you subset second record from last?

We have an option named NOBS= for SET statement. It can be used to create a temporary variable which holds the number of observations in the input dataset. We can make use this temporary variable and _N_ to subset the second record from the last

data seclastobs;
set sashelp.class nobs=num;
if _N_=num-1 then output;

How do you extract last word from a string (or sentence)?

SCAN function can be used to extract words from a string. By default, SCAN function searches for words from left to right. When we give a negative number SCAN functions searches for words from right to left of the string. So, If we give -1 as the second argument to the SCAN, we will get the last word.
lastword=scan(sentence,-1)

How do you extract last two characters from a string (or sentence)?

SUBSTR function can be used to extract substrings from a string. SUBSTR function can only read characters from left to right and does not take negative values (to read from right to left) like in SCAN function. We can make use of LENGTH function to get the length of the string and substract from the length and use that value as second argument in SUBSTR function.
lasttwochars=substr(sentence,length(sentence)-1)

What does the option 'FIRSTOBS=' do?

SAS by default reads an input dataset from the first observation. FIRSTOBS= can be used to specify the observation number from which we want SAS to start reading the observations.

What does the option 'OBS=' do?

SAS by default reads an input dataset till the last observation. OBS= can be used to specify the observation number at which we want SAS to stop reading the observations.

How do you create a single macro variable using proc sql?

We can use into clause of proc sql to create a macro variable.

proc sql;

    select count(*) into :numobs 
    from sashelp.class
    ;

quit;

proc sql;

   select min(age) into :minage
   from sashelp.class;

quit;

How do you create more than one macro variable using proc sql?

We can specify the list of expressions separated by commas, followed by list of macro variables separated by commas to create multiple macro variables simutaneously.

proc sql;

   select min(age),max(age) into :minage, :maxage
   from sashelp.class;

quit;

How do you create a series of macro variables using proc sql?

A series of macro variables can be created by specifying the named range of macro variables

proc sql;

   select distinct age into :age1-:age6
   from sashelp.class;

quit;

Below example creates a list of 6 variables to store the number of records in first 6 age groups

proc sql;

    select count(*) into :age1-:age6
    from sashelp.class
    group by age;

quit;

How do you create a macro variable to hold a series of data values in a single macro variable using proc sql?

We can use 'separated by' option of into clause to create a macro variable to hold a list of values separated by a delimiter.

proc sql;

   select distinct name into:names separated by " "
   from sashelp.class;

quit;

What is TRIMMED option in proc sql while creating macro variables?

When creating macro variables in SAS using proc sql into clause, leading spaces are added to the macro variable values. We can use trimmed option to remove those leading spaces from the macro variables.

proc sql;

select avg(height) into :avg trimmed from sashelp.class;

quit; 

How do we get the the value of the number of observations of a variable present in a dataset into a macro variable using proc sql?

When we want multiple values of a single variable to be stored in a single macro variable we can make use of 'separated by' option along with into clause.

proc sql;

select name into :names separated by ' ' from sashelp.class;

quit; 

How do we create a single macro variable to hold explcit value using call symputx?

When a single macro variable with an explicit name is needed, we need to specify the name of the macro variable in quotes as first argument. As an explicit value has to been assigned to the macro variable, we need to specify the value in quotes as second argument in call symputx

data _null_;

   call symputx("site","mycsg.in");

run;

How do we create a single macro variable to hold the value from a dataset variable using call symputx?

When a single macro variable with an explicit name is needed, we need to specify the name of the macro variable in quotes as first argument. As the value from a variable has to been assigned to the macro variable, we need to specify the name of the data step variable as second argument (without quotes) in call symputx

data _null_;

   set subject_totals;

   call symputx("site",total);

run;

How do we create macro variables, with the names of the macro variables based on the values present in a data step variable using call symputx?

When a series of macro variables with names coming from the data step variables, we need to specify the name of the data step variable (without quotes) as first argument. As the value from a variable has to been assigned to the macro variable, we need to specify the name of the data step variable as second argument (without quotes) in call symputx

data _null_;

   set sashelp.class;

   call symputx(name,age);

run;

How do you create a macro variable to hold a series of data values in a single macro variable using call symputx?

In order to hold a series of values from a data step variable into a single macro variable, we need to employ techniques to concatenate all values into a single variable and create the macro variable conditionally on the last record

data _null_;

   set sashelp.class end=last;

   length names $1000;

   retain names;

   names=catx(" ",names,name);

   if last then call symputx("names",names);

run;

How do we get the the value of the number of observations present in a dataset into a macro variable using call symputx?

We can use nobs option on set statement to create a temporary variable, this variable stores the number of observations in the input dataset. This variable can then be used on call symputx to create the macro variable.

data _null_;

   set sashelp.class nobs=numobs;

   call symputx("classobs",numobs);

run;

What happens if there are some leading spaces in the value assigned to a macro variable using %let statement?

When creating macro variables using %let statement, it ignores all the leading and trailing spaces.

What happens if there are some trailing spaces in the value assigned to a macro variable using %let statement?

When creating macro variables using %let statement, it ignores all the leading and trailing spaces.