*Copyright @ www.mycsg.in;
What is an array in SAS
An indexed set of related elements.
What do we mean by array and array processing in SAS
An array is a temporary grouping of SAS variables that are referenced by an array name and an index
Array processing means applying the same logic repeatedly across a related series of variables
Arrays help reduce repetitive code and make programs easier to maintain when many similar variables must be handled the same way
Create input data
We create a dataset named `scores` containing repeated score variables across visits
This dataset is suitable for demonstrating operations that would otherwise require repeating the same statement for each score variable
data scores; infile cards truncover; input usubjid $ visit $ score1 score2 score3; cards; 1001 Week1 10 20 30 1001 Week2 9 20 29 1001 Week3 . . 25 1001 Week4 . . . 1001 Week5 10 15 . 1001 Week6 10 20 30 ; run;
Copy Code
View Log
SAS Log
data scores; infile cards truncover; input usubjid $ visit $ score1 score2 score3; cards; NOTE: The data set WORK.SCORES has 6 observations and 5 variables. NOTE: Compressing data set WORK.SCORES increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds ; run;
Notice that some score values are missing
Those missing values make it easy to compare a repetitive manual solution with an array-based solution
View Data
Dataset View
Replace missing values in score variables using repeated IF statements
This approach works, but becomes long and harder to maintain as the number of score variables grows
Each variable must be referenced explicitly
data output; set scores; if missing(score1) then score1=0; if missing(score2) then score2=0; if missing(score3) then score3=0; run;
Copy Code
View Log
SAS Log
data output; set scores; if missing(score1) then score1=0; if missing(score2) then score2=0; if missing(score3) then score3=0; run; NOTE: There were 6 observations read from the data set WORK.SCORES. NOTE: The data set WORK.OUTPUT has 6 observations and 5 variables. NOTE: Compressing data set WORK.OUTPUT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
The output dataset replaces missing score values with zero
Use this result as a reference before reviewing the array solution
View Data
Dataset View
Replace missing values in score variables using an array
The array named `scores` groups the variables `score1`, `score2`, and `score3`
The DO loop moves through each indexed variable and applies the same missing value logic
This solution is shorter, clearer, and easier to expand when more score variables are added
data output; set scores; array scores_arr[3] score1 score2 score3; do i=1 to 3; if missing(scores_arr[i]) then scores_arr[i]=0; end; run;
Copy Code
View Log
SAS Log
data output; set scores; array scores_arr[3] score1 score2 score3; do i=1 to 3; if missing(scores_arr[i]) then scores_arr[i]=0; end; run; NOTE: There were 6 observations read from the data set WORK.SCORES. NOTE: The data set WORK.OUTPUT has 6 observations and 6 variables. NOTE: Compressing data set WORK.OUTPUT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
The result should match the earlier manual solution
Observe how one loop replaced three separate `if` statements
View Data
Dataset View
Create a separate row for each score variable using repeated OUTPUT statements
Here, we want to restructure the dataset by taking three score variables and writing one output row per score value
The manual approach repeats the same logic for each score variable
data output; set scores; score=score1; output; score=score2; output; score=score3; output; run;
Copy Code
View Log
SAS Log
data output; set scores; score=score1; output; score=score2; output; score=score3; output; run; NOTE: There were 6 observations read from the data set WORK.SCORES. NOTE: The data set WORK.OUTPUT has 18 observations and 6 variables. NOTE: Compressing data set WORK.OUTPUT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
Each input observation now contributes three output observations
The variable `score` stores one of the original score values on each written row
View Data
Dataset View
Create a separate row for each score variable using an array
The array solution uses the same looping idea, but avoids repeated code
Each array element is assigned to the variable `score` and then written with `output`
This pattern is especially powerful when many repeated variables must be converted into repeated rows
data output; set scores; array scores_arr[3] score1 score2 score3; do i=1 to 3; score=scores_arr[i]; output; end; run;
Copy Code
View Log
SAS Log
data output; set scores; array scores_arr[3] score1 score2 score3; do i=1 to 3; score=scores_arr[i]; output; end; run; NOTE: There were 6 observations read from the data set WORK.SCORES. NOTE: The data set WORK.OUTPUT has 18 observations and 7 variables. NOTE: Compressing data set WORK.OUTPUT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
The output should follow the same logic as the manual version
Compare the two approaches and notice how arrays reduce duplication and improve scalability
View Data
Dataset View
Key points to remember
Arrays group related variables for indexed processing
Arrays simplify repetitive logic such as replacing missing values or writing repeated output rows
Array processing is often more efficient to write, read, and maintain than manually repeating the same statements