*Copyright @ www.mycsg.in;


What is the word meaning of 'duplicate'?

  1. Exactly look like something else
  2. Having two corresponding or identical parts

When do we call a record a duplicate?

    Name Sex Age Height Weight
  1. Alfred M 14 69 112.5
  2. Alfred M 14 69 112.5
  3. Carol F 14 62.8 102.5
  4. Carol F 12 55 84
  5. Carol F 11 51 79
  6. Jane F 12 59.8 84.5

Where do we need to identify the duplicate records (and separate or delete them)?

Creating a sample dataset

Identifying if there is a duplicate record based on a key variable(s)

Create a dataset named class_names by removing the duplicates based on Name variable

Create a dataset named class_names_age by removing the duplicates based on Name and Age variables

Capture the duplicate records getting deleted into another dataset

Create a dataset named class_names_dup to hold the duplicate records based on Name variable

Question Answer
What would happen if out= option is not used when nodupkey is specified? Input dataset gets overwritten with the dataset containing the first instance records of each key variable combinations.

Identify and remove the records which are true duplicate records with nodupkey

Create a dataset named class_noduplicates by removing the records with all variable values are same on more than one record

Removing duplicate records with NODUPREC option

Create a dataset named class_noduplicates2 by removing the duplicate records with NODUPREC option

Create an input dataset

Create a dataset named class_noduplicates3 using classx dataset by removing the duplicate records with NODUPREC option