Why do we need SDTM?
This post is part of 'SDTM | General' series
Let us assume that as part of a clinical trial, we want to collect certain basic information like sex, age, and race of the trial participants(subjects).
Subject number |
|
Sex |
|
Age |
|
Race |
- (1) Asian
- (2) White
- (3) African American
|
This information, along with other trial data, needs to be submitted in a tabular format to the regulatory authorities.
Let us examine some theoretical possibilities on how the collected data can be structured for submission - all the below representations are the data for an Asian male whose age is 52 and given an identification number of 1001.
DEMOG |
Subject |
Gender |
Collected_age |
Race |
1001 |
1 |
52 |
1 |
DM |
Subject |
Sex |
Collected_age |
Race |
1001 |
M |
52 |
Asian |
SUBJINFO |
SUBNUM |
Sex |
age_years |
Race |
1001 |
1 |
52 |
Asian |
What is the problem here?
- There are multiple approaches in which the same collected data can be organized or presented
- When this data is provided to someone for review, we also need to provide metadata (data about this data)
- For the same data, we end up having different metadata based on the organizing structure we choose
- Data reviewers will need to spend time understanding the metadata before they can review the actual data
- Some of the underlying data element concepts remain constant irrespective of the person who is collecting data
- If we can standardize the structure for the commonly used data concepts and enforce it on the data submitters, the end data will always be structured in a predictable way
- Predictable structure has several advantages like easy and efficient data pooling, less or no preparation time understanding the metadata, build reusable computer programs for data analysis etc.
- Study data tabulation model (SDTM) is aimed at this standardization
How does SDTM help in standardization of data structure and formatting?
- Each collected data point has a 'focus' (purpose)
- SDTM has standard dataset structures based on the focus of the observation
- SDTM has standard variables to present the information based on the purpose of the collected data
- SDTM has standard variable values for the commonly used variables
In other words, SDTM allows the users to organize data into:
- Standard dataset names
- Standard variables in each dataset
- Fixed list of allowed values in the variables where applicable
- Flexibility to create custom datasets based on existing standard dataset structures
- Flexibility to extend the allowed values in variables where applicable
The standard way of organizing data collected in our example above is:
- The focus of the observation is 'demographics' of the subject - so, present it in the standard DM (Demographics) data (dataset level standardization)
- Sex of the subject when collected must be presented in a variable named 'SEX' (variable level standardization)
- Age of the subject when collected must be presented in a variable named 'AGE' (variable level standardization)
- Provide the units in which age is collected in a variable named 'AGEU' (variable level standardization)
- Race of the subject when collected must be presented in a variable named 'RACE '(variable level standardization)
- Always use the value 'M' when the subject is male (value level standardization)
DM |
SUBJID |
SEX |
AGE |
AGEU |
Race |
1001 |
M |
52 |
YEARS |
Asian |
So, the main advantage of SDTM standard is that we will have a standard predictable and consistent structure for the data collected in clinical trials.