*Copyright @ www.mycsg.in;
What is the word meaning of 'unique'?
- being the only one of its kind; unlike anything else.
What do we call a record a unique record in dataset?
- We call a record unique when at least one of the variable's value differ when compared to any other observation in the dataset
- When dealing with a specific key variables, we call a record unique if there exists only record in the dataset with the key values combination
Name Sex Age Height Weight
- Alfred M 14 69 112.5
- Alfred M 14 69 112.5
- Carol F 14 62.8 102.5
- Carol F 12 55 84
- Carol F 11 51 79
- Jane F 12 59.8 84.5
- In the above example, neither row 1 nor row 2 is considered a unique record as all the variable values on both these records are same
- Rows 3,4,5,6 are considered unique as at least one variable's value differ when compare to any other observation in the dataset
- If Name is considered as a key variable, then only Jane's record is considered a unique record as there is only one record with a value of Jane in Name variable.
- As there is more than one observation with a value of Alfred in name variable, Alfred is not considered to have a unique record. Same is the case with Carol
If Name and Age are considered as key variables, Rows 3,4, 5 and 6 are considered unique rows as there is only record in the dataset
with the same Name and Age combination values in the entire dataset. As Rows 1 and 2 have same Name and Age values, these records are not considered unique
Creating a sample dataset
Where do we need to identify the non-unique records (and separate them)?
- When there exists more than one record with a key variable combination,we may want to examine those records to see which other variables differ
on those records
- If there exists more than one record for a lab test on a day, we may want to subset those two records and see which variable values differ on those differ
- If there exists more than one record for an event, we may want to subset those event records and see if possibly date or other variable values differ
Separate unique and non-unique records based on key variables
- 'NOUNIQUEKEYS option the proc sort statement can be used to separate unique and non-unique observations
- As the option nouniquekeys indicates, the dataset specified on the out= option will hold the non-unique observations
- To hold the unique records we need to list the dataset name on uniqueout= option
Create a dataset named class_nouniq_names to hold the recods with non-unique names
- As we are interested in separating records on uniqueness in name variable, we need to specify name in the by statement
- As only Jane has a unique record with name as key variable, Jane's record would be written to dataset specified in uniqueout
- As there exists more than record for Alfred and Carol in the dataset with name as key variable, they will be considered non-unique observations
and written to the dataset specified in out= option
Create a dataset named class_nouniq_name_age to hold the recods with non-unique name and age combination
- As we are interested in separating records on uniqueness in name and age variables, we need to specify name and age in the by statement
- As all Carol and Jane's records are unique in Name an Age combination, these records would be written to dataset specified in uniqueout
- As there exists more than record for Alfred in the dataset with same value in name and age variable, they will be considered non-unique observations
and written to the dataset specified in out= option