Previous lessons covered one-to-one merging where each BY key value appears exactly once in both datasets
In a one-to-many merge, one dataset has exactly one row per BY key value and the other dataset has multiple rows for the same key
The typical real-world example is a header dataset that holds one record per subject and a detail dataset that holds multiple records per subject, such as visits, events, or measurements
SAS handles this automatically in the MERGE statement, but the behaviour of variable values across repeated rows deserves careful attention
Create sample datasets
We create a subjects dataset with one row per subject holding demographic information
We also create a visits dataset with multiple rows per subject representing repeated assessments
Both datasets share the key variable `subject` and will be sorted before merging
SAS Log
`subjects` has 4 rows — one per subject including subject 004 who has no visits
`visits` has 9 rows — subjects 001 and 003 have three and four visits respectively, subject 002 has two, and subject 004 has none
Inspect both datasets before merging to understand what the expected output should look like
Dataset View
Sort both datasets before merging
Like all BY-variable merges, a one-to-many merge requires both datasets to be sorted by the BY variable(s)
SAS does not sort automatically and will issue an error if the data is not in order
SAS Log
Perform the one-to-many merge
Basic merge - combine subjects with visits
In a one-to-many merge, SAS reads one row from the single-row dataset and holds it in the PDV while it processes all matching rows from the many-row dataset
The variables from the single-row dataset (`age`, `sex`) carry forward to every matching visit row automatically
This is why merging a header onto a detail file is so convenient — you do not need to explicitly retain the header values
SAS Log
Inspect `subjects_with_visits` and confirm that each visit row now carries the `age` and `sex` values from the corresponding subject
Subject 004, who appears in `subjects` but has no visits, should appear as one row in the output with missing values for `visit` and `result`
This happens because SAS outputs one row per observation in the more-frequent dataset, and for subjects with no visits it outputs one row from the subjects side with the visit variables blank
Dataset View
Keep only rows that exist in both datasets
If you want only subjects who have at least one visit, use the `IN=` flags to filter the output
Subjects with no visits are excluded by requiring `in_v=1`
SAS Log
Subject 004 should now be absent from `matched_only` because that subject has no visits in the visits dataset
Confirm the row count: 9 rows matching the 9 rows in `visits`
Dataset View
What happens when both datasets have repeated BY key values
Sequential pairing - not a cartesian product
When both input datasets contain more than one row for the same BY key value, SAS does not produce a cartesian product
Instead, SAS pairs rows one by one within the group: the first row from each dataset, then the second row from each, and so on
When one dataset runs out of rows for a key, the last-seen variable values from that dataset are retained for the remaining pairings
The number of output rows for a BY group equals the maximum number of rows that group has in either input dataset
SAS issues the WARNING: `MERGE statement has more than one data set with repeats of BY values` when this situation is detected
The result is not a cartesian join explosion, but it is almost always wrong because retained variable values silently carry over into rows where they do not belong
SAS Log
Run this code and check the SAS log for the MERGE warning about repeated BY values
Inspect `many_to_many_demo` — for subject 001, SAS produces 3 rows, which is the maximum of 2 (from `events_a`) and 3 (from `events_b`)
The pairing is: EVA1 with EVB1, EVA2 with EVB2, then EVA2 retained with EVB3 because `events_a` has no third row for subject 001
Subject 002 produces 1 row because both datasets have exactly one row for it
Total output rows: 4 — but the EVA2 value appearing twice in subject 001 is not intentional data and is a silent error
This is the core danger: wrong values appear without any error, only a WARNING in the log
When a true many-to-many relationship exists and all combinations are required, use PROC SQL with an explicit join on the key variable instead of DATA step MERGE
Dataset View
Safe approach - verify one-to-many assumption before merging
Before performing a one-to-many merge, it is good practice to verify that the header dataset truly has one row per BY key
PROC SORT with the `NODUPKEY` option can enforce uniqueness on the header side and will delete duplicate keys if any exist, printing a note to the log
An even safer approach is to check for duplicates first using PROC FREQ or a DATA step before merging
SAS Log
If duplicates are found, SAS prints a note stating how many observations were deleted
If no duplicates exist, the step runs silently and the data is unchanged
After confirming the header is unique, proceed with the one-to-many merge with confidence
Key points to remember
A one-to-many merge combines a unique-key header dataset with a detail dataset that has multiple rows per key
Sort both datasets by the BY variable before merging
SAS automatically carries header variable values across all matching detail rows, so you do not need RETAIN for this purpose
Subjects in the header but absent from the detail file produce one output row with missing values for detail variables unless filtered with IN= flags
If both datasets have repeated BY values, SAS pairs rows sequentially and retains the last value from the shorter side — this is not a cartesian product but produces wrong data silently; use PROC SQL for true many-to-many relationships
Use `proc sort nodupkey` or a pre-merge uniqueness check to confirm the one-to-many assumption before production runs