Why do we need SDTM?

This post is part of 'SDTM | General' series

Let us assume that as part of a clinical trial, we want to collect certain basic information like sex, age, and race of the trial participants(subjects).

Subject number  
  • (1) Male
  • (2) Female
  • (1) Asian
  • (2) White
  • (3) African American

This information, along with other trial data, needs to be submitted in a tabular format to the regulatory authorities. 

Let us examine some theoretical possibilities on how the collected data can be structured for submission - all the below representations are the data for an Asian male whose age is 52 and given an identification number of 1001.

Subject Gender Collected_age Race
1001 1 52 1


Subject Sex Collected_age Race
1001 M 52 Asian


SUBNUM Sex age_years Race
1001 1 52 Asian

What is the problem here?

  • There are multiple approaches in which the same collected data can be organized or presented
  • When this data is provided to someone for review, we also need to provide metadata (data about this data)
  • For the same data, we end up having different metadata based on the organizing structure we choose
  • Data reviewers will need to spend time understanding the metadata before they can review the actual data


  • Some of the underlying data element concepts remain constant irrespective of the person who is collecting data
  • If we can standardize the structure for the commonly used data concepts and enforce it on the data submitters, the end data will always be structured in a predictable way
  • Predictable structure has several advantages like easy and efficient data pooling, less or no preparation time understanding the metadata, build reusable computer programs for data analysis etc.
  • Study data tabulation model (SDTM) is aimed at this standardization

How does SDTM help in standardization of data structure and formatting?

  • Each collected data point has a 'focus' (purpose)
  • SDTM has standard dataset structures based on the focus of the observation
  • SDTM has standard variables to present the information based on the purpose of the collected data
  • SDTM has standard variable values for the commonly used variables

In other words, SDTM allows the users to organize data into:

  • Standard dataset names
  • Standard variables in each dataset
  • Fixed list of allowed values in the variables where applicable
  • Flexibility to create custom datasets based on existing standard dataset structures
  • Flexibility to extend the allowed values in variables where applicable


The standard way of organizing data collected in our example above is:

  • The focus of the observation is 'demographics' of the subject - so, present it in the standard DM (Demographics) data (dataset level standardization)
  • Sex of the subject when collected must be presented in a variable named 'SEX' (variable level standardization)
  • Age of the subject when collected must be presented in a variable named 'AGE' (variable level standardization)
  • Provide the units in which age is collected in a variable named 'AGEU' (variable level standardization)
  • Race of the subject when collected must be presented in a variable named 'RACE '(variable level standardization)
  • Always use the value 'M' when the subject is male (value level standardization) 
1001 M 52 YEARS Asian


So, the main advantage of SDTM standard is that we will have a standard predictable and consistent structure for the data collected in clinical trials.

Filter a category

List of other posts


What is MedDRA
What is WHO-DD
What is ATC classification of Drugs
Tables vs Listings in Clinical Trial Analysis
Analysis Sets in Clinical Trial Data Analysis
Validation of TFLs in Clinical Trials
Vital Signs in Clinical Trials
ECG in clinical trials
Solid tumors vs other cancers
What is 21 CFR Part 11?
Declaration of Helsinki
Importance of Harmonization (ICH)
The Drug development process
Overview of clinical trial process
What is a clinical trial?
What is Pinnacle 21?
Who conducts clinical trials?
Why are clinical trials conducted?
What are the phases of clinical trials?
Why are oncology clinical trials considered critical?
What is a clinical trial protocol?
Case Report Form (CRF)
Database annotated CRF
What is a clinical trial registry?
Factors affecting drug metabolism and activity
Prior and Concomitant medications
Inclusion/Exclusion Criteria in a clinical trial
What happens after a clinical trial is completed?
The Investigational New Drug Process
Preclinical Research
Drug discovery
FDA Drug Review
What is the importance of baseline characteristics in a clinical trial?
Why do we need CDISC standards?
What is a clinical development plan?
What is a clinical study report?
"Exploratory study" vs "Confirmatory study"
What is ICH?
Clinical trial registries
Efficacy data vs Safety data
What is clinical data management?
Clinical SAS Programmer
Statistical Analysis Plan
The 27 System Ogran classes (MedDRA)
What is RECIST 1.1?

Trial design aspects
What is a crossover clinical trial?
What is blinding in clinical trials?
What is an open-label clinical trial?
What is randomization in clinical trials?

What is a cohort?


What is a computer?
What happens when we execute a SAS program?
What is software?
What is SAS?
What is data?
What is data entry?
What is data retrieval?
What is data management?
What is "Report"?
What is statistics?
What is Statistical Analysis?
How do we use SAS?
What kind of questions can SAS help us answer?
How do we provide instructions to SAS?
What is a SAS program?
What does a SAS program contain?

Attributes of a SAS dataset
Rules for SAS dataset names
Rules for SAS variable names
Rules for SAS library names
Rules for character SAS format names
Reserved SAS dataset names
Rules for numeric SAS format names
What can SAS dataset options do?
Attributes of a SAS variable
Automatic conversion of data types in SAS
How does SAS expect our data to be organized?
Introduction to SAS interface
By groups in SAS

Rules for character SAS informat names

Proc freq
Count the number of times a particular value occurred in a variable of a dataset

Proc contents
Check the list of variables in a SAS dataset

Proc datasets
Delete all sas datasets from a library
Delete specific sas datasets from a library
Save specific sas datasets (and delete others) of a library
Rename SAS datasets using proc datasets change statement

Log issues
WARNING: No matching members in directory.

One-line definitions
What is a SAS library?
What is a libref?
What is an input statement?
What is infile statement?
What is set statement?
What is length statement?


How to derive baseline flag in SDTM
How to create SEQ variable in SDTM
New domains in SDTM IG 3.3
What is a codelist?
How to derive study day variable in SDTM
What is SDTM?
Why do we need SDTM?
What is SDTM annotated Case Report Form (acrf.pdf)
How to convert original results to standard results using conversion factors

What information does SDTM.DM (Demographics) contain?

Adverse Events
What is causality assessment?
What information does SDTM.AE (Adverse Events) contain?

What information does SDTM.DS (Disposition) contain?

Concomitant Medications
What information does SDTM.CM (Concomitant/Prior Medications) contain?

What information does SDTM.PR (Procedures) contain?


What is ADaM?

What is ADSL as per ADaM standard?

What is BDS as per ADaM standard?


What is correlation?
Descriptive statistics
Inferential statistics
ANCOVA - Basic example
ANOVA - Basic example
What is Binomial test
What is chi-square test
What is 95% confidence interval
Measurements of central tendency
Measurements of spread
Alpha (Type I) error
Beta (Type II) error
Repeated measures analysis
ttest - introduction
Multiple imputation
What is Standard error
What is Fishers exact test
What is Least Squares Means


You are good! But... You can be much better!!
A little motivational post!
"Book to bill" ratio