Overview
This vignette documents the causal assumptions underlying the CoMMpass survival and differential expression analyses. We use directed acyclic graphs (DAGs) to make these assumptions explicit, identify confounders, and check whether current models adjust for the right variables.
Note
DAGs encode domain knowledge, not statistical results. They represent our assumptions about how variables relate causally, which then determines what we need to adjust for in regression models.
Causal DAG
The CoMMpass DAG encodes relationships between cytogenetic risk, ISS stage, treatment, gene expression, and overall survival. Latent (unmeasured) variables — tumor biology and comorbidities — represent important confounders that cannot be directly adjusted for.

Key Causal Assumptions
Cytogenetic risk → survival: High-risk cytogenetics (del(17p), t(4;14), gain(1q)) directly shorten survival via treatment resistance and aggressive disease biology.
ISS stage as mediator: Cytogenetic risk influences ISS stage (via tumor burden), which in turn affects treatment decisions and survival.
Treatment as collider: Treatment selection depends on both disease severity (ISS, cytogenetics) and patient fitness (age, comorbidities). Conditioning on treatment can induce collider bias.
Gene expression as downstream: Expression patterns are downstream of both cytogenetic alterations and tumor biology. They mediate some of the effect on survival.
Adjustment Sets
For each causal question, the DAG identifies the minimal sufficient adjustment set — the smallest set of variables that blocks all backdoor paths between exposure and outcome.
DAG-implied minimal sufficient adjustment sets for each causal question. Each set blocks all backdoor paths between exposure and outcome. Source: dagitty::adjustmentSets() applied to the CoMMpass causal DAG.
|
Analysis
|
Exposure
|
Outcome
|
Minimal Adjustment Sets
|
|
cyto_survival
|
cytogenetic_risk
|
overall_survival
|
Not identifiable
|
|
treatment_survival
|
treatment
|
overall_survival
|
Not identifiable
|
|
iss_survival
|
iss_stage
|
overall_survival
|
{age, cytogenetic_risk, gene_expression}
|
|
expression_survival
|
gene_expression
|
overall_survival
|
{age, cytogenetic_risk, iss_stage}
|
Model Adequacy Check
We compare the covariates used in current Cox proportional hazards models against the DAG-implied adjustment sets.
Comparison of current Cox model covariates against DAG-implied adjustment sets. Sufficient = TRUE means the model adjusts for all confounders on at least one backdoor path.
|
Model
|
Covariates
|
Sufficient
|
Recommendation
|
|
cox_age_iss
|
age, iss_stage
|
FALSE
|
No adjustment set exists (effect may not be identifiable).
|
|
cox_full
|
age, gender, iss_stage
|
FALSE
|
No adjustment set exists (effect may not be identifiable).
|
Interpretation
- If Sufficient = TRUE, the model’s covariates include at least one complete adjustment set — the causal effect estimate is valid under the DAG assumptions.
- If Sufficient = FALSE, there may be unblocked backdoor paths (residual confounding). The recommendation column suggests which variables to add.
Warning
Unmeasured confounders: The DAG includes latent variables (tumor biology, comorbidities) that cannot be adjusted for with available data. Causal estimates remain subject to unmeasured confounding even with perfect measured-variable adjustment.
Implications for Analysis
| DE analysis |
Cytogenetic risk → expression |
Adjust for age (confounder via comorbidities) |
| KM survival |
Cytogenetic risk → OS |
Stratify by ISS; do NOT condition on treatment |
| Cox regression |
Cytogenetic risk → OS |
Adjust for age, comorbidities; avoid treatment as covariate |
| Treatment effect |
Treatment → OS |
Adjust for age, ISS, cytogenetic risk |
References
- Textor J, van der Zander B, Gilthorpe MS, Liśkiewicz M, Ellison GT (2016). Robust causal inference using directed acyclic graphs: the R package dagitty. International Journal of Epidemiology, 45(6), 1887-1894.
- Hernán MA, Robins JM (2020). Causal Inference: What If. Chapman & Hall/CRC.
- Pearl J (2009). Causality. Cambridge University Press.
Show code
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] base64url_1.4 gtable_0.3.6 jsonlite_2.0.0
#> [4] dplyr_1.2.0 compiler_4.5.3 tidyselect_1.2.1
#> [7] callr_3.7.6 scales_1.4.0 yaml_2.3.12
#> [10] fastmap_1.2.0 ggplot2_4.0.2 R6_2.6.1
#> [13] generics_0.1.4 igraph_2.2.2 knitr_1.51
#> [16] backports_1.5.0 targets_1.12.0 tibble_3.3.1
#> [19] pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.1.7
#> [22] xfun_0.57 S7_0.2.1 otel_0.2.0
#> [25] cli_3.6.5 withr_3.0.2 magrittr_2.0.4
#> [28] ps_1.9.1 digest_0.6.39 grid_4.5.3
#> [31] processx_3.8.6 secretbase_1.2.0 lifecycle_1.0.5
#> [34] prettyunits_1.2.0 vctrs_0.7.2 evaluate_1.0.5
#> [37] glue_1.8.0 data.table_1.18.2.1 farver_2.1.2
#> [40] codetools_0.2-20 rmarkdown_2.30 tools_4.5.3
#> [43] pkgconfig_2.0.3 htmltools_0.5.9