mycsg QnA

What are the metadata items associated with ADaM datasets?

Dataset name
Dataset label
Class
Structure
Unique keys
Documentation

An analysis dataset will be called "analysis-ready" if it contains all the variables and records needed to develop/replicate an analysis result by performing the actual statistical test without first having to manipulate the data.

How would you know if the result (AVAL/AVALC) is imputed?

If AVAL/AVALC is imputed or if the entire record is imputed we will populate the DTYPE variable. So, a non-null value in DTYPE indicates that the record is imputed.

Explain the difference between PARAMTYP, DTYPE, BASETYPE

PARAMTYP variable is used for indicating whether the parameter on a particular observation is derived parameter or a collected parameter. PARAMTYP variable will populated 'DERIVED' for derived parameters and will be left null for collected parameters.
DTYPE variable is used for indicating whether a particular record is a collected record or a derived record. DTYPE variable will be populated with appropriate keyword (like LOCF, AVERAGE, MINIMUM, MAXIMUM etc) indicating the derivation algorithm used for creating the derived record and it will be left null for collected records.
BASETYPE variable is used to indicate the baseline definition used for populating BASE variable on a particular record. It is used only when there is more than one definition of baseline within a dataset or parameter. Examples of different definitions include: 1) Last known value prior to treatement start 2) Minimum value on screening period 3) Maximum value on screening period 4) Average result of last two collections prior to treatment start 5) Average of all results available prior to treatment start

Explain the concept of ?Last Observed Value? used in ADaM. How do we implement it?

The term 'Last Observed Value' is generally used to indicate the last known result while the subject is on study treatment. In ADaM implementation, we can use two methods to indicate such record. 1)Create a flag variable and populate it as 'Y' on the last observed value. 2) We can create an additional record as a copy of the record corresponding to the last observed value. Second approach is preferrable when last observed value is presented as a conceptual timepoint in a by visit analysis.

Explain the concept of date imputations. How will we know if a date value on a particular record is imputed?

In clinical trial data, we often encounter partial dates (one or more components of a date (year, month, day) are not known). For analysis purposes like calculating approximate durations, we fill the missing components with prespecfied values. This process is called date imputations. In ADaM, if an imputed date value is stored in any analysis date variable (ADT, ASTDT, AENDT etc) we use date imputation flag variable -DTF to indicate the level of imputation performed. -DTF will have a value of Y when Year is imputed, M when month is imputed, D when day is imputed.
Similarly, for time imputations, we use -TMF variable to indicate the level of imputation. -TMF will have a value of 'H' if hour is imputed, 'M' when minutes are imputed, 'S' when seconds are imputed.

When you have multiple records for the same subject, same assessment, same visit, same date & time, which approach you follow to select the records for analysis... will you select all or any specific record and the reason?

When there exists multiple records at the lowest level of time point precision, we generally create a new record to hold the average of the result and use it in analysis. If the result is of qualitative nature on those records we may choose either the best case or worst case based on the analysis requirement by creating a flag variable to choose the appropriate record.

Do you impute death date, if the date collected is partial?

Based on analysis requirements, we may impute death date if it is partial if it is prespecified in statistical analysis plan.

How would you calculate CHG and PCHG when there is no baseline record for a subject?

CHG and PCHG cannot be populated when there is not baseline record for a subject.

If a parameter result is collected twice (actual and retest) during the baseline period, which of these two records would be considered for baseline flagging?

Based on the analysis requirement or the reason a retest performed retest may be given preference for choosing as baseline. In cases when a retest is performed because of suspected sample issue or faulty device retest will be used for baseline.

Can we use treatment-emergent information from SDTM dataset?

Treatment-emergent defintion remains constant at a protocol level. So, treatment-emergent information derived at SDTM level can be used for analysis, provided the treatment-emergent flag value does not change because of imputation of partial dates or imputation of treatment dates with cutoff date in ongoing studies etc.

Is it possible to have TRTEMFL as missing when a subject has SAFFL as ?Y??

Yes, it is possible to have records with TRTEMFL as missing as there can be adverse events which started before treatment start or which started after the treatment date (+ treatment cutoff based on half life of study drug).

When do we derive TRTEMFL with respect to date imputation? Before imputing the dates of after imputing the dates?

We will derive TRTEMFL after performing date imputaions, using the imputed dates.

What are SMQ variables in ADAE? What is the purpose of these variables?

SMQ stands for Standardized MedDRA queries. As part of evaluation of a drug's safety, reviewers may be interested in checking for the presence of a group of adverse events. (Incomplete)

If AGE is not collected on CRF, how would you derived it? Which date would you consider as the reference timepoint for calculating age? (Screening or Treatment start date)

If AGE is not collected on CRF, AGE can be derived in analysis dataset level. Floor(reference date - birth date +1)/365.25. Reference date can be informed consent date, screening date or treatment start date based on the analysis requirements.

If weight or height values is missing, will you impute that record first and then calculate BMI or will you directly impute derived BMI value from previous record?

If weight or height value is missing, the individual missing parameter is imputed first and then the imputed component is used to derive BMI.

Can we apply rounding/format to results in dataset level?

Based on the precision of results required for a parameter and precision of results collected, rounding can be applied to results.

What is the record structure of ADSL dataset?

Subject Level analysis data (ADSL) has one record per subject irrespective of the study design

Mention some of the variable groups seen in ADSL?

Identifier Variables
Subject Demographic Variables
Population Indicator Variables
Treatment Variables
Dose Variables
Treatment Timing Variables
Subject-Level Period, Subperiod, and Phase Timing Variables
Subject-Level Trial Experience Variables
Stratification Variables

What are EOTSTT and EOSSTT variables in ADSL? How do you derive them?

EOTSTT and EOSSTT variables are used to capture the end of treatment and end of study status respectively of a subject as 'ONGOING', 'COMPLETED' and 'DISCONTINUED'.
These variables are derived based on the information present in SDTM disposition dataset.
We need DSDECOD, DSSCAT and DSCAT variables.
Presence of a record with DSCAT="DISPOSITION EVENT" and DSSCAT="END OF TREATMENT" and DSDECOD="COMPLETED" indicates the completion of treatment for a subject and if DSDECOD on this record is any value other than 'COMPLETED', it indicates subject 'DISCONTINUED'.
Absence of a record indicates that the subject is 'ONGOING'.
We can derive EOSSTT similarly