Dataset name
Dataset label
Unique keys
An analysis dataset will be called "analysis-ready" if it contains all the variables and records needed to develop/replicate an analysis result by performing the actual statistical test without first having to manipulate the data.
If AVAL/AVALC is imputed or if the entire record is imputed we will populate the DTYPE variable. So, a non-null value in DTYPE indicates that the record is imputed.
PARAMTYP variable is used for indicating whether the parameter on a particular observation is derived parameter or a collected parameter. PARAMTYP variable will populated 'DERIVED' for derived parameters and will be left null for collected parameters.
DTYPE variable is used for indicating whether a particular record is a collected record or a derived record. DTYPE variable will be populated with appropriate keyword (like LOCF, AVERAGE, MINIMUM, MAXIMUM etc) indicating the derivation algorithm used for creating the derived record and it will be left null for collected records.
BASETYPE variable is used to indicate the baseline definition used for populating BASE variable on a particular record. It is used only when there is more than one definition of baseline within a dataset or parameter. Examples of different definitions include: 1) Last known value prior to treatement start 2) Minimum value on screening period 3) Maximum value on screening period 4) Average result of last two collections prior to treatment start 5) Average of all results available prior to treatment start
The term 'Last Observed Value' is generally used to indicate the last known result while the subject is on study treatment. In ADaM implementation, we can use two methods to indicate such record. 1)Create a flag variable and populate it as 'Y' on the last observed value. 2) We can create an additional record as a copy of the record corresponding to the last observed value. Second approach is preferrable when last observed value is presented as a conceptual timepoint in a by visit analysis.
In clinical trial data, we often encounter partial dates (one or more components of a date (year, month, day) are not known). For analysis purposes like calculating approximate durations, we fill the missing components with prespecfied values. This process is called date imputations. In ADaM, if an imputed date value is stored in any analysis date variable (ADT, ASTDT, AENDT etc) we use date imputation flag variable -DTF to indicate the level of imputation performed. -DTF will have a value of Y when Year is imputed, M when month is imputed, D when day is imputed.
Similarly, for time imputations, we use -TMF variable to indicate the level of imputation. -TMF will have a value of 'H' if hour is imputed, 'M' when minutes are imputed, 'S' when seconds are imputed.
When there exists multiple records at the lowest level of time point precision, we generally create a new record to hold the average of the result and use it in analysis. If the result is of qualitative nature on those records we may choose either the best case or worst case based on the analysis requirement by creating a flag variable to choose the appropriate record.
Based on analysis requirements, we may impute death date if it is partial if it is prespecified in statistical analysis plan.
CHG and PCHG cannot be populated when there is not baseline record for a subject.
Based on the analysis requirement or the reason a retest performed retest may be given preference for choosing as baseline. In cases when a retest is performed because of suspected sample issue or faulty device retest will be used for baseline.
Treatment-emergent defintion remains constant at a protocol level. So, treatment-emergent information derived at SDTM level can be used for analysis, provided the treatment-emergent flag value does not change because of imputation of partial dates or imputation of treatment dates with cutoff date in ongoing studies etc.
Yes, it is possible to have records with TRTEMFL as missing as there can be adverse events which started before treatment start or which started after the treatment date (+ treatment cutoff based on half life of study drug).
We will derive TRTEMFL after performing date imputaions, using the imputed dates.
SMQ stands for Standardized MedDRA queries. As part of evaluation of a drug's safety, reviewers may be interested in checking for the presence of a group of adverse events. (Incomplete)
If AGE is not collected on CRF, AGE can be derived in analysis dataset level. Floor(reference date - birth date +1)/365.25. Reference date can be informed consent date, screening date or treatment start date based on the analysis requirements.
If weight or height value is missing, the individual missing parameter is imputed first and then the imputed component is used to derive BMI.
Based on the precision of results required for a parameter and precision of results collected, rounding can be applied to results.
Subject Level analysis data (ADSL) has one record per subject irrespective of the study design
Identifier Variables
Subject Demographic Variables
Population Indicator Variables
Treatment Variables
Dose Variables
Treatment Timing Variables
Subject-Level Period, Subperiod, and Phase Timing Variables
Subject-Level Trial Experience Variables
Stratification Variables
  • EOTSTT and EOSSTT variables are used to capture the end of treatment and end of study status respectively of a subject as 'ONGOING', 'COMPLETED' and 'DISCONTINUED'.
  • These variables are derived based on the information present in SDTM disposition dataset.
  • We need DSDECOD, DSSCAT and DSCAT variables.
  • Presence of a record with DSCAT="DISPOSITION EVENT" and DSSCAT="END OF TREATMENT" and DSDECOD="COMPLETED" indicates the completion of treatment for a subject and if DSDECOD on this record is any value other than 'COMPLETED', it indicates subject 'DISCONTINUED'.
  • Absence of a record indicates that the subject is 'ONGOING'.
  • We can derive EOSSTT similarly