Why do we need SDTM?

This post is part of 'SDTM | General' series

Let us assume that as part of a clinical trial, we want to collect certain basic information like sex, age, and race of the trial participants(subjects).

Subject number  
  • (1) Male
  • (2) Female
  • (1) Asian
  • (2) White
  • (3) African American

This information, along with other trial data, needs to be submitted in a tabular format to the regulatory authorities. 

Let us examine some theoretical possibilities on how the collected data can be structured for submission - all the below representations are the data for an Asian male whose age is 52 and given an identification number of 1001.

Subject Gender Collected_age Race
1001 1 52 1


Subject Sex Collected_age Race
1001 M 52 Asian


SUBNUM Sex age_years Race
1001 1 52 Asian

What is the problem here?

  • There are multiple approaches in which the same collected data can be organized or presented
  • When this data is provided to someone for review, we also need to provide metadata (data about this data)
  • For the same data, we end up having different metadata based on the organizing structure we choose
  • Data reviewers will need to spend time understanding the metadata before they can review the actual data


  • Some of the underlying data element concepts remain constant irrespective of the person who is collecting data
  • If we can standardize the structure for the commonly used data concepts and enforce it on the data submitters, the end data will always be structured in a predictable way
  • Predictable structure has several advantages like easy and efficient data pooling, less or no preparation time understanding the metadata, build reusable computer programs for data analysis etc.
  • Study data tabulation model (SDTM) is aimed at this standardization

How does SDTM help in standardization of data structure and formatting?

  • Each collected data point has a 'focus' (purpose)
  • SDTM has standard dataset structures based on the focus of the observation
  • SDTM has standard variables to present the information based on the purpose of the collected data
  • SDTM has standard variable values for the commonly used variables

In other words, SDTM allows the users to organize data into:

  • Standard dataset names
  • Standard variables in each dataset
  • Fixed list of allowed values in the variables where applicable
  • Flexibility to create custom datasets based on existing standard dataset structures
  • Flexibility to extend the allowed values in variables where applicable


The standard way of organizing data collected in our example above is:

  • The focus of the observation is 'demographics' of the subject - so, present it in the standard DM (Demographics) data (dataset level standardization)
  • Sex of the subject when collected must be presented in a variable named 'SEX' (variable level standardization)
  • Age of the subject when collected must be presented in a variable named 'AGE' (variable level standardization)
  • Provide the units in which age is collected in a variable named 'AGEU' (variable level standardization)
  • Race of the subject when collected must be presented in a variable named 'RACE '(variable level standardization)
  • Always use the value 'M' when the subject is male (value level standardization) 
1001 M 52 YEARS Asian


So, the main advantage of SDTM standard is that we will have a standard predictable and consistent structure for the data collected in clinical trials.