*Copyright @ www.mycsg.in;
Using DO loops for data creation in SAS
A DO loop repeats the same block of statements for a sequence of values
DO loops are very useful when creating test data, repeated visit structures, and patterned records
Instead of writing many `output;` blocks manually, we can let SAS iterate through a list or range of values
Create data containing a list of 10 patients with patient numbers from 1 to 10
The loop variable `subject` starts at 1 and increases by 1 until it reaches 10
Each loop iteration writes one observation using `output`
data dummy01; do subject=1 to 10; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1 to 10; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 10 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
The resulting dataset contains 10 observations with subject values 1 through 10
View Data
Dataset View
Create data containing a list of 10 patients with patient numbers from 101 to 110
This example uses a different starting value, but the same loop concept
SAS still increments by 1 because no `by` value is provided
data dummy01; do subject=101 to 110; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=101 to 110; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 10 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
View Data
Dataset View
Create data containing a list of 100 patients with patient numbers from 1001 to 1100
DO loops scale easily to larger ranges
This is much faster and cleaner than coding 100 explicit observations by hand
data dummy01; do subject=1001 to 1100; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1001 to 1100; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 100 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
View Data
Dataset View
Create data containing a list of 3 patients with patient numbers 1001, 2001, and 3001
A DO loop can also iterate over an explicit list of values instead of a continuous range
This is useful when the wanted values are not evenly spaced
data dummy01; do subject=1001,2001,3001; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1001,2001,3001; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 3 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
View Data
Dataset View
Create data containing a list of 15 patients with patient numbers 1001 to 1005, 2001 to 2005, and 3001 to 3005
A single DO statement can combine several ranges in one loop definition
SAS processes the values from left to right in the order they are listed
data dummy01; do subject=1001 to 1005,2001 to 2005,3001 to 3005; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1001 to 1005,2001 to 2005,3001 to 3005; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 15 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
View Data
Dataset View
Create data containing a list of 4 patients with patient numbers incremented by a value of 5
The `by` clause changes the increment between loop values
Here, SAS moves from 1000 to 1015 in steps of 5
data dummy01; do subject=1000 to 1015 by 5; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1000 to 1015 by 5; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 4 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
View Data
Dataset View
Create data containing a list of 4 patients with patient numbers decremented by a value of 5
A DO loop can also count downward when a negative `by` value is supplied
This is useful when a descending sequence is required
data dummy01; do subject=1015 to 1000 by -5; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1015 to 1000 by -5; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 4 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
View Data
Dataset View
Create data containing a list of 3 patients with patient numbers A001, B001, and C001
DO loops can also iterate over explicit character values
This is useful for alphanumeric identifiers that do not form a numeric range
data dummy01; length subject $4; do subject="A001","B001","C001"; output; end; run;
Copy Code
View Log
SAS Log
data dummy01; length subject $4; do subject="A001","B001","C001"; output; end; run; NOTE: Compression was disabled for data set WORK.DUMMY01 because compression overhead would increase the size of the data set. NOTE: The data set WORK.DUMMY01 has 3 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
View Data
Dataset View
Create data containing a list of 3 patients with patient numbers 1001, 2001, 3001 and 3 visits for each subject
Nested DO loops are used when one repeated structure exists inside another
Here, each subject has three visits, so the inner loop writes three records for every subject value generated by the outer loop
data dummy01; do subject=1001,2001,3001; do visitnum=1 to 3; output; end; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1001,2001,3001; do visitnum=1 to 3; output; end; end; run; NOTE: The data set WORK.DUMMY01 has 9 observations and 2 variables. NOTE: Compressing data set WORK.DUMMY01 increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
The resulting dataset contains 9 observations because 3 subjects times 3 visits equals 9 rows
View Data
Dataset View
Create data containing a list of 3 patients with patient numbers 1001, 2001, 3001 and 3 visits for each subject, and within each visit 3 tests
This example adds a third nested loop for test codes
Such structures are common when creating mock clinical data with subjects, visits, and test-level records
The total number of rows should equal subjects multiplied by visits multiplied by tests
data dummy01; do subject=1001,2001,3001; do visitnum=1 to 3; do vstestcd="SYSBP","DIABP","PULSE"; output; end; end; end; run;
Copy Code
View Log
SAS Log
data dummy01; do subject=1001,2001,3001; do visitnum=1 to 3; do vstestcd="SYSBP","DIABP","PULSE"; output; end; end; end; run; NOTE: The data set WORK.DUMMY01 has 27 observations and 3 variables. NOTE: Compressing data set WORK.DUMMY01 increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
The dataset now contains 27 observations because 3 subjects times 3 visits times 3 tests equals 27 rows
Inspect the data and confirm that the pattern of repeated values matches the nested loop structure
View Data
Dataset View
Key points to remember
DO loops create repeated observations efficiently
You can loop across ranges, explicit lists, or custom increments
Negative `by` values support descending sequences
Nested loops are powerful for creating hierarchical structures such as subject, visit, and test data