*Copyright @ www.mycsg.in;

PROC COMPARE - Part 02: CRITERION= Tolerance, ID= Variable Matching, and QC Workflows

Overview

Part 01 covered the fundamentals of PROC COMPARE: comparing dataset structure, variable attributes, and observation values.
This lesson builds on that foundation with three practical extensions: numeric tolerance via CRITERION=, key-based observation matching via ID=, and using PROC COMPARE systematically in a QC or validation workflow.
These techniques are especially useful in clinical programming where datasets are compared across production and validation environments.

Numeric Tolerance with CRITERION=

By default, PROC COMPARE flags any numeric difference, no matter how small — even a floating-point rounding difference at the 15th decimal place will be reported as a discrepancy.
The CRITERION= option sets a relative tolerance threshold. Two values are considered equal if their absolute difference divided by the larger absolute value is less than the criterion.
The default criterion is 1E-15. For most clinical QC purposes, a criterion of 1E-8 or 1E-6 is sufficient to ignore floating-point noise while still catching meaningful differences.
Use CRITERION= when comparing derived statistics such as means, percentages, or change-from-baseline values that may differ slightly due to floating-point arithmetic.

data work.expected;
    input subject result;
    datalines;
001 12.3456789012345
002 45.6789012345678
003  8.0000000000001
;
run;
 
data work.actual;
    input subject result;
    datalines;
001 12.3456789012346
002 45.6789012345679
003  8.0000000000002
;
run;
 
*------------------------------------------------------------------------------;
* Default criterion - will flag tiny floating-point differences ;
*------------------------------------------------------------------------------;
proc compare base=work.expected compare=work.actual;
run;
 
*------------------------------------------------------------------------------;
* Relaxed criterion - treats differences smaller than 1E-8 as equal ;
*------------------------------------------------------------------------------;
proc compare base=work.expected compare=work.actual criterion=1e-8;
run;

Run both PROC COMPARE calls and examine the log output carefully.
The first call with default CRITERION= will likely report differences in all three rows because the values differ at the 13th-14th decimal place.
The second call with CRITERION=1E-8 should report a clean compare with no value differences, because all differences are far smaller than the threshold.
Choose your criterion thoughtfully — setting it too high risks masking real data differences; setting it at the default risks noise from floating-point arithmetic overwhelming your QC results.

Matching Observations by Key Variable with ID=

By default, PROC COMPARE matches observations by their position: the first row of BASE is compared with the first row of COMPARE, the second with the second, and so on.
Position-based matching breaks down when the two datasets have different sort orders or different numbers of rows — mismatched positions produce misleading discrepancy reports.
The ID= option instructs PROC COMPARE to match observations by the value of one or more key variables, similar to a BY-key merge.
Both datasets must be sorted by the ID variable(s) before running PROC COMPARE with ID=.
Observations present in one dataset but not the other are reported as exclusive observations rather than value differences.

data work.prod;
    input subject $ visit result;
    datalines;
001 1 10
001 2 12
002 1  9
002 2 11
003 1 14
;
run;
 
data work.qc;
    input subject $ visit result;
    datalines;
002 1  9
002 2 11
001 1 10
001 2 13
003 1 14
003 2 15
;
run;
 
proc sort data=work.prod; by subject visit; run;
proc sort data=work.qc;   by subject visit; run;
 
proc compare base=work.prod compare=work.qc id=subject visit;
run;

WORK.QC is deliberately in a different order and contains an extra row (003/visit 2) and a changed value (001/visit 2: result is 13 in QC vs 12 in PROD).
With ID=subject visit, PROC COMPARE matches on the key combination rather than row position, so the sort order difference in QC does not cause false discrepancies.
The log will report the value difference for subject 001 visit 2, and will flag subject 003 visit 2 as exclusive to the COMPARE dataset.
Review the "Observation Summary" section in the output to confirm how many observations were matched, and the "Values Comparison Summary" for the count of variables with differences.

Capturing PROC COMPARE Results Programmatically with OUTNOEQUAL and OUT=

When running PROC COMPARE in a QC workflow, you often want to capture the differences in a dataset for reporting or further analysis rather than relying on reading the log.
The OUT= option writes a comparison output dataset. Combined with OUTNOEQUAL, only rows with at least one difference are written — rows that match perfectly are excluded.
The LISTALL option in the PROC COMPARE statement makes the output more complete by including all observations, not just those with differences.

proc compare
    base=work.prod
    compare=work.qc
    out=work.diffs
    outnoequal
    id=subject visit;
run;
 
*------------------------------------------------------------------------------;
* Check whether any differences were found ;
*------------------------------------------------------------------------------;
data _null_;
    if 0 then set work.diffs nobs=n;
    if n = 0 then put "NOTE: Clean compare - no differences found.";
    else put "WARNING: " n "difference(s) found - review WORK.DIFFS.";
    stop;
run;

WORK.DIFFS will contain one row per variable per observation where a difference was detected, along with the BASE and COMPARE values and the difference amount.
The DATA _NULL_ step checks whether WORK.DIFFS has any rows and writes a summary message to the log — this is a clean pattern for automated QC pipelines where you want a clear pass/fail indicator.
Inspect WORK.DIFFS to confirm it contains only the discrepant rows identified in the previous step.

Macro-Driven QC Pattern

In production QC environments, it is common to compare many datasets in a single pass using a macro loop.
The following macro accepts a dataset name, runs PROC COMPARE, and writes a one-line summary to a running QC log dataset.
This pattern allows a QC programmer to run comparisons on dozens of datasets and collect all results in a single report dataset.

%macro compare_ds(dsname=, idvars=);
    %let base_ds  = work.&dsname.;
    %let comp_ds  = work.&dsname._qc;
 
    %if %sysfunc(exist(&base_ds.)) = 0 or %sysfunc(exist(&comp_ds.)) = 0 %then %do;
        %put WARNING: One or both datasets for &dsname. do not exist - skipping;
        %return;
    %end;
 
    proc compare
        base=&base_ds.
        compare=&comp_ds.
        out=work._diffs_&dsname.
        outnoequal
        noprint
        %if %length(&idvars.) > 0 %then id=&idvars.;;
    run;
 
    data work.qc_summary;
        set %sysfunc(ifc(%sysfunc(exist(work.qc_summary)), work.qc_summary, _null_));
        length dataset $32 status $20 diff_rows 8;
        if _n_ = 1 and not %sysfunc(exist(work.qc_summary)) then do;
            dataset = "&dsname.";
            if 0 then set work._diffs_&dsname. nobs=n;
            diff_rows = n;
            if n = 0 then status = "CLEAN";
            else status = "DIFFERENCES";
            output;
        end;
    run;
%mend compare_ds;

This macro is a template to illustrate the pattern — in practice you would call it once per dataset pair in your QC plan.
The NOPRINT option suppresses PROC COMPARE output to the SAS output window, keeping the log clean when running many comparisons.
%SYSFUNC(exist()) guards against the macro crashing when a dataset is missing — this is good defensive programming for QC workflows.
WORK.QC_SUMMARY accumulates one row per dataset compared, allowing a final report of all comparison results.

Key Points

CRITERION= controls numeric tolerance in PROC COMPARE — the default is 1E-15 which may flag floating-point noise; relax to 1E-8 or 1E-6 for practical QC comparisons.
ID= matches observations by key variable values rather than row position — always sort both datasets by the ID variables before running PROC COMPARE with ID=.
OUT= with OUTNOEQUAL captures only differing rows in a dataset, enabling programmatic pass/fail checks without reading the log.
NOPRINT suppresses log and output window detail — useful in macro-driven QC loops where you capture results in a summary dataset instead.
Always check for exclusive observations (rows present in one dataset only) as well as value differences — both types of discrepancy matter for QC.

*Copyright @ www.mycsg.in;

PROC COMPARE - Part 02: CRITERION= Tolerance, ID= Variable Matching, and QC Workflows

Overview

Numeric Tolerance with CRITERION=

SAS Log

Matching Observations by Key Variable with ID=

SAS Log

Capturing PROC COMPARE Results Programmatically with OUTNOEQUAL and OUT=

SAS Log

Dataset View

Macro-Driven QC Pattern

SAS Log

Key Points