The sum statement is a compact DATA step statement with the form: `variable + expression;`
It performs three things automatically that would otherwise require multiple lines of code
First, it retains the variable across DATA step iterations without a separate `retain` statement
Second, it initialises the variable to 0 at the beginning of the DATA step rather than to missing
Third, when the expression evaluates to missing, the current value of the variable is left unchanged rather than being set to missing
These three automatic behaviours make the sum statement a concise and safe way to build running totals and counters
Sum statement syntax
The general form is `variable + expression;`, where `variable` is the accumulating variable and `expression` is the amount to add
The expression can be a constant such as `1`, a variable name, or an arithmetic expression
There is no `run;` or keyword — just the variable, a plus sign, and the expression followed by a semicolon
SAS Log
Comparison with the RETAIN approach
The previous lesson on RETAIN showed how to preserve a variable across iterations using an explicit `retain` statement
The sum statement replaces the combination of `retain` plus the assignment statement, and also handles missing values more safely
The table below shows the same task written both ways
SAS Log
The sum statement is shorter and the missing-value protection is built in, whereas the RETAIN approach requires you to explicitly check for missing values yourself if you do not want them to break the accumulation
For simple counters and accumulators, the sum statement is generally preferred
Create input dataset
The dataset `scores` contains test-level scores for each subject
Subject 1003 has a missing score for item 2 to demonstrate how the sum statement handles missing values
SAS Log
The three subjects each have three rows with item scores
Subject 1003, row two has a missing score — this will demonstrate the missing-value tolerance of the sum statement
Dataset View
Using the sum statement as a row counter
A sum statement with `+ 1` increments by one on every row of the input dataset
No `retain` or initialisation is needed — the variable starts at 0 before the first row is read
This is a simple and reliable way to assign a sequential row number
SAS Log
Inspect `rownum` in the output dataset — it should count from 1 upward across all nine rows
Notice that no `retain rownum 0;` statement was required — the sum statement handled initialisation automatically
Dataset View
Using the sum statement for a running total
Adding a score variable accumulates the running total across all rows in the order they are read
The missing score for subject 1003 item 2 should not break the accumulation — the sum statement skips missing contributions
SAS Log
Examine the `running_total` column — it climbs as each non-missing score is added
On the row where `score` is missing for subject 1003 item 2, `running_total` should remain unchanged from the previous row rather than going to missing
This confirms the built-in missing-value protection of the sum statement
Dataset View
Using the sum statement within BY groups
For subject-level totals rather than a grand running total, you need to reset the accumulating variable at the start of each new subject
The sum statement retains the value across rows, so you must explicitly assign zero when `first.usubjid` is true
After the reset, the sum statement continues accumulating within that subject group
SAS Log
The explicit assignment `subject_total = 0` resets the accumulator at the boundary between subjects
The sum statement then accumulates within each subject from that reset point
Check the last row for each subject — `subject_total` should equal the sum of non-missing scores for that subject
For subject 1003, the total should be 198 (100 + 98 with the missing score skipped)
Dataset View
Using the sum statement as a within-group counter
The same reset-and-accumulate pattern applies when counting rows within a group rather than summing a numeric variable
Reset to zero on `first.` and use `+ 1` to count each row within the group
SAS Log
`item_count` restarts at 1 for each new subject and increments for every row in that subject's group
On the last row of each subject, `item_count` holds the total number of records for that subject
This is equivalent to the RETAIN counter shown in lesson L150, but written more concisely
Dataset View
Key points to remember
The sum statement `variable + expression;` automatically retains the variable, initialises it to 0, and ignores missing values in the expression
For a grand running total across all rows, use the sum statement alone — no `retain` or `by` processing is needed
For group-level accumulators, reset the variable explicitly using `if first.groupvar then variable = 0;` before the sum statement
The sum statement is generally preferred over the equivalent RETAIN pattern because it is shorter and safer with missing values
Both the sum statement and RETAIN are DATA-step-only features — they do not apply inside PROC SQL or PROC steps