Common Logics - Part 02: LOCF, Visit Window Assignment, and Treatment Flags
Overview
Part 01 covered shell row creation, pre/post-treatment filtering, merge-based filtering, and basic LAG/RETAIN comparisons.
This lesson covers three further patterns that appear repeatedly in clinical programming: Last Observation Carried Forward (LOCF), visit window assignment, and baseline and post-baseline treatment flag derivation.
All three patterns share the same core DATA step techniques — RETAIN, BY-group processing, and conditional logic — applied to clinically meaningful problems.
Last Observation Carried Forward (LOCF)
LOCF is an imputation method where missing values at a scheduled visit are replaced by the most recent preceding non-missing value for the same subject.
It is commonly applied to efficacy endpoints in clinical trials when a subject misses a scheduled assessment — the assumption is that the value did not change between the last observed visit and the missed visit.
The implementation uses RETAIN to carry the last non-missing value forward and a conditional check to decide whether to impute.
LOCF should only be applied within a subject (not across subjects), so a BY subject statement with FIRST.subject reset is essential.
SAS Log
RETAIN last_result keeps the most recent non-missing result across iterations within the same subject.
first.subject resets last_result to missing at the start of each new subject, ensuring imputation never crosses subject boundaries.
When result is missing: result_locf gets the retained previous value (if one exists).
When result is present: result_locf gets the observed value and last_result is updated for future rows.
locf_flag = 1 identifies which rows were imputed — always retain this indicator so downstream analyses can distinguish observed from imputed values.
Check WORK.LOCF: subject 001 visit 2 should be imputed with 42, visits 4 and 5 with 38. Subject 002 visit 3 should be imputed with 51.
Subject 001 visit 1's missing predecessor means LOCF cannot impute if the very first visit is missing — result_locf would remain missing in that case.
Dataset View
Visit Window Assignment
Clinical studies define planned visits (Week 2, Week 4, Week 8, etc.) with an allowable window around each target day — for example, Week 4 might accept any visit occurring on study day 22 through 35.
Visit window assignment maps each actual visit's study day to its planned visit label based on these day ranges.
The standard approach uses a reference dataset of planned visit windows and merges it with the actual visit data, or uses a format to perform the mapping.
Observations outside all defined windows are typically assigned a "Unscheduled" or missing planned visit label.
SAS Log
The SQL range join matches each actual visit's studyday against the day_low and day_high boundaries of each planned window.
A LEFT JOIN retains all actual visits even if they fall outside all windows — those rows will have visit_planned = missing (unscheduled).
Check WORK.ASSIGNED: subject 001 day 28 should map to Week4 (window 22-35), day 56 to Week8 (50-65), day 85 to Week12 (78-91).
Subject 002 day 40 falls between Week2 (8-21) and Week4 (22-35) windows — it should have a missing visit_planned, flagging it as unscheduled.
Verify observation count: WORK.ASSIGNED should have the same number of rows as WORK.ACTUAL_VISITS since all observations are retained by the LEFT JOIN.
Dataset View
Baseline and Post-Baseline Treatment Flags
In clinical analysis, baseline is the last measurement taken before the first dose of study treatment. Post-baseline measurements are all measurements taken after the first dose.
A common derivation task is to flag each observation as baseline (ANL01FL or ABLFL in CDISC ADaM) or post-baseline, and to carry the baseline value forward to post-baseline rows for change-from-baseline calculations.
The logic requires the first dose date (or study day 1 as the reference) and the observation's study day to determine position relative to treatment start.
SAS Log
The first DATA step flags each row as PRE or POST based on studyday relative to day 1 (first dose assumed at day 1), and retains the running baseline_val for use in change calculations.
CHG (change from baseline) = result - baseline_val is only computed for POST rows; it is set to missing for PRE rows as change from baseline is not meaningful before treatment.
The second pass sorts descending by studyday within subject so that the first PRE row encountered per subject (when sorted descending) is actually the last pre-dose observation — this is the analysis baseline row that gets ABLFL = "Y".
Check WORK.WITH_ABLFL: each subject should have exactly one row with ABLFL = "Y" (the last pre-dose row), and all POST rows should have a non-missing CHG value.
Subject 001: the row with studyday = -1 should be ABLFL = Y with baseline_val = 44; post-dose rows should show CHG = result - 44.
Dataset View
Key Points
LOCF uses RETAIN to carry the last non-missing value forward within a subject — always reset the retained value at first.subject and flag imputed rows with an indicator variable.
Visit window assignment is best done with a SQL range join against a window reference dataset — the LEFT JOIN ensures unscheduled visits (outside all windows) are retained with a missing planned visit label.
Baseline flagging requires a two-pass approach: derive CHG in the first pass using a retained baseline value, then flag ABLFL on the last pre-dose row in a second pass using a descending sort.
Always verify observation counts at each step to ensure no rows are lost during merges or BY-group processing.
These patterns recur across most clinical datasets — mastering them builds a foundation for ADaM dataset derivations.