*Copyright @ www.mycsg.in;

What is the word meaning of 'unique'?

being the only one of its kind; unlike anything else.

What do we call a record a unique record in dataset?

We call a record unique when at least one of the variable's value differ when compared to any other observation in the dataset
When dealing with a specific key variables, we call a record unique if there exists only record in the dataset with the key values combination

Alfred M 14 69 112.5
Alfred M 14 69 112.5
Carol F 14 62.8 102.5
Carol F 12 55 84
Carol F 11 51 79
Jane F 12 59.8 84.5

In the above example, neither row 1 nor row 2 is considered a unique record as all the variable values on both these records are same
Rows 3,4,5,6 are considered unique as at least one variable's value differ when compare to any other observation in the dataset
If Name is considered as a key variable, then only Jane's record is considered a unique record as there is only one record with a value of Jane in Name variable.
As there is more than one observation with a value of Alfred in name variable, Alfred is not considered to have a unique record. Same is the case with Carol

Creating a sample dataset

data class;
   infile cards truncover;
   input Name$      Sex$   Age    Height    Weight;
   cards;
Alfred    M     14     69        112.5
Alfred    M     14     69        112.5
Carol     F     14     62.8      102.5
Carol     F     12     55        84
Carol     F     11     51        79
Jane      F     12     59.8      84.5
;
run;

Where do we need to identify the non-unique records (and separate them)?

When there exists more than one record with a key variable combination,we may want to examine those records to see which other variables differ on those records
If there exists more than one record for a lab test on a day, we may want to subset those two records and see which variable values differ on those differ
If there exists more than one record for an event, we may want to subset those event records and see if possibly date or other variable values differ

Separate unique and non-unique records based on key variables

'NOUNIQUEKEYS option the proc sort statement can be used to separate unique and non-unique observations
As the option nouniquekeys indicates, the dataset specified on the out= option will hold the non-unique observations
To hold the unique records we need to list the dataset name on uniqueout= option

Create a dataset named class_nouniq_names to hold the recods with non-unique names

As we are interested in separating records on uniqueness in name variable, we need to specify name in the by statement
As only Jane has a unique record with name as key variable, Jane's record would be written to dataset specified in uniqueout
As there exists more than record for Alfred and Carol in the dataset with name as key variable, they will be considered non-unique observations and written to the dataset specified in out= option

proc sort data=class out=class_nouniq_names uniqueout=class_uniq_names nouniquekeys;
   by name;
run;

Create a dataset named class_nouniq_name_age to hold the recods with non-unique name and age combination

As we are interested in separating records on uniqueness in name and age variables, we need to specify name and age in the by statement
As all Carol and Jane's records are unique in Name an Age combination, these records would be written to dataset specified in uniqueout
As there exists more than record for Alfred in the dataset with same value in name and age variable, they will be considered non-unique observations and written to the dataset specified in out= option

proc sort data=class out=class_nouniq_name_age uniqueout=class_uniq_name_age nouniquekeys;
   by name age;
run;