*Copyright @ www.mycsg.in;

Common Logics - Part 02: LOCF, Visit Window Assignment, and Treatment Flags

Overview

Part 01 covered shell row creation, pre/post-treatment filtering, merge-based filtering, and basic LAG/RETAIN comparisons.
This lesson covers three further patterns that appear repeatedly in clinical programming: Last Observation Carried Forward (LOCF), visit window assignment, and baseline and post-baseline treatment flag derivation.
All three patterns share the same core DATA step techniques — RETAIN, BY-group processing, and conditional logic — applied to clinically meaningful problems.

Last Observation Carried Forward (LOCF)

LOCF is an imputation method where missing values at a scheduled visit are replaced by the most recent preceding non-missing value for the same subject.
It is commonly applied to efficacy endpoints in clinical trials when a subject misses a scheduled assessment — the assumption is that the value did not change between the last observed visit and the missed visit.
The implementation uses RETAIN to carry the last non-missing value forward and a conditional check to decide whether to impute.
LOCF should only be applied within a subject (not across subjects), so a BY subject statement with FIRST.subject reset is essential.

data work.efficacy;
    input subject $ visit result;
    datalines;
001 1 42
001 2 .
001 3 38
001 4 .
001 5 .
002 1 55
002 2 51
002 3 .
002 4 48
;
run;
 
proc sort data=work.efficacy; by subject visit; run;
 
data work.locf;
    set work.efficacy;
    by subject;
 
    retain last_result;
 
    if first.subject then last_result = .;
 
    * Apply LOCF: use retained value when result is missing ;
    if result = . then result_locf = last_result;
    else do;
        result_locf = result;
        last_result = result;
    end;
 
    * Flag imputed observations ;
    locf_flag = (result = . and result_locf ne .);
run;

RETAIN last_result keeps the most recent non-missing result across iterations within the same subject.
first.subject resets last_result to missing at the start of each new subject, ensuring imputation never crosses subject boundaries.
When result is missing: result_locf gets the retained previous value (if one exists).
When result is present: result_locf gets the observed value and last_result is updated for future rows.
locf_flag = 1 identifies which rows were imputed — always retain this indicator so downstream analyses can distinguish observed from imputed values.
Check WORK.LOCF: subject 001 visit 2 should be imputed with 42, visits 4 and 5 with 38. Subject 002 visit 3 should be imputed with 51.
Subject 001 visit 1's missing predecessor means LOCF cannot impute if the very first visit is missing — result_locf would remain missing in that case.

Visit Window Assignment

Clinical studies define planned visits (Week 2, Week 4, Week 8, etc.) with an allowable window around each target day — for example, Week 4 might accept any visit occurring on study day 22 through 35.
Visit window assignment maps each actual visit's study day to its planned visit label based on these day ranges.
The standard approach uses a reference dataset of planned visit windows and merges it with the actual visit data, or uses a format to perform the mapping.
Observations outside all defined windows are typically assigned a "Unscheduled" or missing planned visit label.

*------------------------------------------------------------------------------;
* Define planned visit windows ;
*------------------------------------------------------------------------------;
data work.windows;
    input visit_planned $10. day_low day_high;
    datalines;
Baseline   -7    1
Week2      8    21
Week4     22    35
Week8     50    65
Week12    78    91
;
run;
 
*------------------------------------------------------------------------------;
* Actual visit observations with study day ;
*------------------------------------------------------------------------------;
data work.actual_visits;
    input subject $ studyday result;
    datalines;
001  1   42
001 14   38
001 28   35
001 56   31
001 85   29
002  0   55
002 10   51
002 40   48
002 70   44
;
run;
 
*------------------------------------------------------------------------------;
* Assign planned visit label using a SQL range join ;
*------------------------------------------------------------------------------;
proc sql;
    create table work.assigned as
    select a.subject, a.studyday, a.result,
           b.visit_planned
    from work.actual_visits as a
    left join work.windows as b
        on a.studyday >= b.day_low
       and a.studyday <= b.day_high
    order by a.subject, a.studyday;
quit;

The SQL range join matches each actual visit's studyday against the day_low and day_high boundaries of each planned window.
A LEFT JOIN retains all actual visits even if they fall outside all windows — those rows will have visit_planned = missing (unscheduled).
Check WORK.ASSIGNED: subject 001 day 28 should map to Week4 (window 22-35), day 56 to Week8 (50-65), day 85 to Week12 (78-91).
Subject 002 day 40 falls between Week2 (8-21) and Week4 (22-35) windows — it should have a missing visit_planned, flagging it as unscheduled.
Verify observation count: WORK.ASSIGNED should have the same number of rows as WORK.ACTUAL_VISITS since all observations are retained by the LEFT JOIN.

Baseline and Post-Baseline Treatment Flags

In clinical analysis, baseline is the last measurement taken before the first dose of study treatment. Post-baseline measurements are all measurements taken after the first dose.
A common derivation task is to flag each observation as baseline (ANL01FL or ABLFL in CDISC ADaM) or post-baseline, and to carry the baseline value forward to post-baseline rows for change-from-baseline calculations.
The logic requires the first dose date (or study day 1 as the reference) and the observation's study day to determine position relative to treatment start.

data work.obs_with_dose;
    input subject $ studyday result;
    datalines;
001 -7  42
001 -1  44
001  1  45
001 14  38
001 28  35
002 -3  55
002  1  53
002 14  51
002 28  48
;
run;
 
proc sort data=work.obs_with_dose; by subject studyday; run;
 
data work.flagged;
    set work.obs_with_dose;
    by subject;
 
    retain baseline_val;
 
    if first.subject then do;
        baseline_val = .;
        baseline_set = 0;
    end;
 
    * Flag: baseline is last pre-dose row (studyday <= 0) ;
    if studyday <= 0 then do;
        pre_post = "PRE";
        baseline_val = result;
    end;
    else do;
        pre_post = "POST";
        if baseline_set = 0 then baseline_set = 1;
    end;
 
    * Baseline flag: last pre-dose row per subject ;
    * Carried value for CHG calculation ;
    chg = result - baseline_val;
    if pre_post = "PRE" then chg = .;
run;
 
*------------------------------------------------------------------------------;
* Flag the last pre-dose row as the analysis baseline ;
*------------------------------------------------------------------------------;
data work.with_ablfl;
    set work.flagged;
    by subject;
    ablfl = "";
run;
 
proc sort data=work.with_ablfl; by subject descending studyday; run;
 
data work.with_ablfl;
    set work.with_ablfl;
    by subject;
    retain ablfl_done;
    if first.subject then ablfl_done = 0;
    if pre_post = "PRE" and ablfl_done = 0 then do;
        ablfl = "Y";
        ablfl_done = 1;
    end;
run;
 
proc sort data=work.with_ablfl; by subject studyday; run;

data work.obs_with_dose;
input subject $ studyday result;
datalines;
 
NOTE: The data set WORK.OBS_WITH_DOSE has 9 observations and 3 variables.
NOTE: Compressing data set WORK.OBS_WITH_DOSE increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time           0.06 seconds
cpu time            0.00 seconds
 
 
;
run;
 
proc sort data=work.obs_with_dose; by subject studyday; run;
 
NOTE: There were 9 observations read from the data set WORK.OBS_WITH_DOSE.
NOTE: The data set WORK.OBS_WITH_DOSE has 9 observations and 3 variables.
NOTE: Compressing data set WORK.OBS_WITH_DOSE increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds
 
 
 
data work.flagged;
set work.obs_with_dose;
by subject;
 
retain baseline_val;
 
if first.subject then do;
baseline_val = .;
baseline_set = 0;
end;
 
* Flag: baseline is last pre-dose row (studyday <= 0) ;
if studyday <= 0 then do;
pre_post = "PRE";
baseline_val = result;
end;
else do;
pre_post = "POST";
if baseline_set = 0 then baseline_set = 1;
end;
 
* Baseline flag: last pre-dose row per subject ;
* Carried value for CHG calculation ;
chg = result - baseline_val;
if pre_post = "PRE" then chg = .;
run;
 
NOTE: There were 9 observations read from the data set WORK.OBS_WITH_DOSE.
NOTE: The data set WORK.FLAGGED has 9 observations and 7 variables.
NOTE: Compressing data set WORK.FLAGGED increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds
 
 
 
*------------------------------------------------------------------------------;
* Flag the last pre-dose row as the analysis baseline ;
*------------------------------------------------------------------------------;
data work.with_ablfl;
set work.flagged;
by subject;
ablfl = "";
run;
 
NOTE: There were 9 observations read from the data set WORK.FLAGGED.
NOTE: The data set WORK.WITH_ABLFL has 9 observations and 8 variables.
NOTE: Compressing data set WORK.WITH_ABLFL increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time           0.00 seconds
cpu time            0.01 seconds
 
 
 
proc sort data=work.with_ablfl; by subject descending studyday; run;
 
NOTE: There were 9 observations read from the data set WORK.WITH_ABLFL.
NOTE: The data set WORK.WITH_ABLFL has 9 observations and 8 variables.
NOTE: Compressing data set WORK.WITH_ABLFL increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds
 
 
 
data work.with_ablfl;
set work.with_ablfl;
by subject;
retain ablfl_done;
if first.subject then ablfl_done = 0;
if pre_post = "PRE" and ablfl_done = 0 then do;
ablfl = "Y";
ablfl_done = 1;
end;
run;
 
NOTE: There were 9 observations read from the data set WORK.WITH_ABLFL.
NOTE: The data set WORK.WITH_ABLFL has 9 observations and 9 variables.
NOTE: Compressing data set WORK.WITH_ABLFL increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds
 
 
 
proc sort data=work.with_ablfl; by subject studyday; run;
 
NOTE: There were 9 observations read from the data set WORK.WITH_ABLFL.
NOTE: The data set WORK.WITH_ABLFL has 9 observations and 9 variables.
NOTE: Compressing data set WORK.WITH_ABLFL increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds

The first DATA step flags each row as PRE or POST based on studyday relative to day 1 (first dose assumed at day 1), and retains the running baseline_val for use in change calculations.
CHG (change from baseline) = result - baseline_val is only computed for POST rows; it is set to missing for PRE rows as change from baseline is not meaningful before treatment.
The second pass sorts descending by studyday within subject so that the first PRE row encountered per subject (when sorted descending) is actually the last pre-dose observation — this is the analysis baseline row that gets ABLFL = "Y".
Check WORK.WITH_ABLFL: each subject should have exactly one row with ABLFL = "Y" (the last pre-dose row), and all POST rows should have a non-missing CHG value.
Subject 001: the row with studyday = -1 should be ABLFL = Y with baseline_val = 44; post-dose rows should show CHG = result - 44.

Key Points

LOCF uses RETAIN to carry the last non-missing value forward within a subject — always reset the retained value at first.subject and flag imputed rows with an indicator variable.
Visit window assignment is best done with a SQL range join against a window reference dataset — the LEFT JOIN ensures unscheduled visits (outside all windows) are retained with a missing planned visit label.
Baseline flagging requires a two-pass approach: derive CHG in the first pass using a retained baseline value, then flag ABLFL on the last pre-dose row in a second pass using a descending sort.
Always verify observation counts at each step to ensure no rows are lost during merges or BY-group processing.
These patterns recur across most clinical datasets — mastering them builds a foundation for ADaM dataset derivations.

*Copyright @ www.mycsg.in;

Common Logics - Part 02: LOCF, Visit Window Assignment, and Treatment Flags

Overview

Last Observation Carried Forward (LOCF)

SAS Log

Dataset View

Visit Window Assignment

SAS Log

Dataset View

Baseline and Post-Baseline Treatment Flags

SAS Log

Dataset View

Key Points