PROC SQL - Part 05: CASE WHEN, COALESCE, and Calculated Columns
Overview
Previous PROC SQL lessons covered SELECT, WHERE, ORDER BY (L101), table creation and joins basics (L102), subqueries (L103), and GROUP BY with HAVING (L104).
This lesson covers three column-level SQL features: CASE WHEN for conditional column derivation, COALESCE for handling null values, and the CALCULATED keyword for referencing a derived column within the same SELECT or WHERE clause.
These features allow complex conditional logic and null handling to be expressed directly in a SQL SELECT statement, often eliminating the need for a separate DATA step.
CASE WHEN - Conditional Column Derivation
CASE WHEN is the SQL equivalent of IF-THEN-ELSE logic in a DATA step. It evaluates conditions in order and returns the first matching result.
Syntax: CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 ... ELSE default_result END
The ELSE clause provides a default value when no WHEN condition matches. If ELSE is omitted and no condition matches, the result is NULL (missing).
CASE WHEN can produce both numeric and character results, but all THEN and ELSE branches must return the same type.
SAS Log
The first CASE uses searched form (WHEN condition THEN ...), which can evaluate any logical expression in each branch.
The second CASE uses simple form (CASE column WHEN value THEN ...), which compares a single column against fixed values — more concise when testing equality only.
LENGTH=10 and LENGTH=4 specify the character length of the new columns. Always specify length for character CASE results to prevent SAS from truncating values.
Check WORK.CLASSIFIED: verify that age_group, trt_label, and response_cat are correct for each row.
Dataset View
CASE WHEN with Aggregation
CASE WHEN can be nested inside aggregate functions to create conditional summaries — equivalent to a filtered count or sum.
This pattern produces multiple summary values in a single GROUP BY query, replacing what would otherwise require multiple sub-queries or DATA step logic.
SAS Log
SUM(CASE WHEN aval >= 10 THEN 1 ELSE 0 END) counts the number of rows where aval is 10 or above within each treatment group.
MEAN(CASE WHEN age < 40 THEN aval ELSE . END) computes the mean of aval only for younger subjects — the ELSE . returns a missing value for older subjects, which MEAN ignores by default.
Check WORK.SUMMARY — verify the counts and means against WORK.SUBJECTS for each treatment group.
Dataset View
COALESCE - Returning the First Non-Null Value
COALESCE(expr1, expr2, ...) evaluates each expression in order and returns the first one that is not null (not missing).
It is used to provide fallback values: if the primary source is missing, use an alternative.
A common use in clinical data: a subject's recorded value may come from one of several columns depending on which collection method was used. COALESCE picks the first available non-missing value.
All arguments must be of the same type (all numeric or all character).
SAS Log
COALESCE picks the first non-missing value across the three result columns — the CASE column alongside it identifies which source provided the value.
Subject 001: primary_result = 12.5, so best_result = 12.5 and source = primary.
Subject 004: primary_result = 15.2 is non-missing, so it takes priority, even though manual_entry (15.0) also has a value.
Subject 005: all three are missing, so best_result = missing and source = none.
Inspect WORK.RESOLVED to confirm all rows match the expected logic.
Dataset View
CALCULATED - Referencing a Derived Column
In standard SQL, you cannot reference an alias defined in the SELECT list within the same SELECT clause or in a WHERE clause — you have to repeat the entire expression.
PROC SQL extends standard SQL with the CALCULATED keyword, which allows a column alias defined earlier in the SELECT list to be referenced by name in subsequent expressions or in a WHERE clause.
This avoids repeating long expressions and makes the query easier to read and maintain.
SAS Log
In WORK.WITH_CALC, the rate_flag CASE expression uses CALCULATED dose_rate to refer to the already-defined column alias — the expression is written once.
In WORK.WITHOUT_CALC, the full expression round(aval / age * 10, 0.01) is repeated inside the CASE — both produce identical output, but the first is cleaner.
CALCULATED can also be used in the WHERE clause: WHERE calculated dose_rate > 0.3 filters rows based on the derived value without repeating the expression.
Compare WORK.WITH_CALC and WORK.WITHOUT_CALC — they should be identical in content, confirming both approaches produce the same result.
Dataset View
Key Points
CASE WHEN evaluates conditions in order and returns the first matching result — the searched form handles any condition; the simple form tests a single column against fixed values.
CASE WHEN inside aggregate functions (SUM, MEAN) enables conditional aggregation — equivalent to counting or averaging only rows that meet a condition.
COALESCE returns the first non-null value from a list of expressions — the standard approach for resolving missing values from multiple fallback sources.
CALCULATED lets you reference a column alias defined in the same SELECT list, avoiding repeated expressions and improving readability.
Always specify LENGTH= for character columns derived from CASE expressions to prevent unexpected truncation.