Skip to contents

Overview

This vignette documents the causal assumptions underlying the CoMMpass survival and differential expression analyses. We use directed acyclic graphs (DAGs) to make these assumptions explicit, identify confounders, and check whether current models adjust for the right variables.

Note

DAGs encode domain knowledge, not statistical results. They represent our assumptions about how variables relate causally, which then determines what we need to adjust for in regression models.

Causal DAG

The CoMMpass DAG encodes relationships between cytogenetic risk, ISS stage, treatment, gene expression, and overall survival. Latent (unmeasured) variables — tumor biology and comorbidities — represent important confounders that cannot be directly adjusted for.

Key Causal Assumptions

  1. Cytogenetic risk → survival: High-risk cytogenetics (del(17p), t(4;14), gain(1q)) directly shorten survival via treatment resistance and aggressive disease biology.

  2. ISS stage as mediator: Cytogenetic risk influences ISS stage (via tumor burden), which in turn affects treatment decisions and survival.

  3. Treatment as collider: Treatment selection depends on both disease severity (ISS, cytogenetics) and patient fitness (age, comorbidities). Conditioning on treatment can induce collider bias.

  4. Gene expression as downstream: Expression patterns are downstream of both cytogenetic alterations and tumor biology. They mediate some of the effect on survival.

Adjustment Sets

For each causal question, the DAG identifies the minimal sufficient adjustment set — the smallest set of variables that blocks all backdoor paths between exposure and outcome.

DAG-implied minimal sufficient adjustment sets for each causal question. Each set blocks all backdoor paths between exposure and outcome. Source: dagitty::adjustmentSets() applied to the CoMMpass causal DAG.
Analysis Exposure Outcome Minimal Adjustment Sets
cyto_survival cytogenetic_risk overall_survival Not identifiable
treatment_survival treatment overall_survival Not identifiable
iss_survival iss_stage overall_survival {age, cytogenetic_risk, gene_expression}
expression_survival gene_expression overall_survival {age, cytogenetic_risk, iss_stage}

Model Adequacy Check

We compare the covariates used in current Cox proportional hazards models against the DAG-implied adjustment sets.

Comparison of current Cox model covariates against DAG-implied adjustment sets. Sufficient = TRUE means the model adjusts for all confounders on at least one backdoor path.
Model Covariates Sufficient Recommendation
cox_age_iss age, iss_stage FALSE No adjustment set exists (effect may not be identifiable).
cox_full age, gender, iss_stage FALSE No adjustment set exists (effect may not be identifiable).

Interpretation

  • If Sufficient = TRUE, the model’s covariates include at least one complete adjustment set — the causal effect estimate is valid under the DAG assumptions.
  • If Sufficient = FALSE, there may be unblocked backdoor paths (residual confounding). The recommendation column suggests which variables to add.

Warning

Unmeasured confounders: The DAG includes latent variables (tumor biology, comorbidities) that cannot be adjusted for with available data. Causal estimates remain subject to unmeasured confounding even with perfect measured-variable adjustment.

Implications for Analysis

Analysis Causal Question DAG Recommendation
DE analysis Cytogenetic risk → expression Adjust for age (confounder via comorbidities)
KM survival Cytogenetic risk → OS Stratify by ISS; do NOT condition on treatment
Cox regression Cytogenetic risk → OS Adjust for age, comorbidities; avoid treatment as covariate
Treatment effect Treatment → OS Adjust for age, ISS, cytogenetic risk

References

  • Textor J, van der Zander B, Gilthorpe MS, Liśkiewicz M, Ellison GT (2016). Robust causal inference using directed acyclic graphs: the R package dagitty. International Journal of Epidemiology, 45(6), 1887-1894.
  • Hernán MA, Robins JM (2020). Causal Inference: What If. Chapman & Hall/CRC.
  • Pearl J (2009). Causality. Cambridge University Press.
Show code
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] base64url_1.4       gtable_0.3.6        jsonlite_2.0.0     
#>  [4] dplyr_1.2.0         compiler_4.5.3      tidyselect_1.2.1   
#>  [7] callr_3.7.6         scales_1.4.0        yaml_2.3.12        
#> [10] fastmap_1.2.0       ggplot2_4.0.2       R6_2.6.1           
#> [13] generics_0.1.4      igraph_2.2.2        knitr_1.51         
#> [16] backports_1.5.0     targets_1.12.0      tibble_3.3.1       
#> [19] pillar_1.11.1       RColorBrewer_1.1-3  rlang_1.1.7        
#> [22] xfun_0.57           S7_0.2.1            otel_0.2.0         
#> [25] cli_3.6.5           withr_3.0.2         magrittr_2.0.4     
#> [28] ps_1.9.1            digest_0.6.39       grid_4.5.3         
#> [31] processx_3.8.6      secretbase_1.2.0    lifecycle_1.0.5    
#> [34] prettyunits_1.2.0   vctrs_0.7.2         evaluate_1.0.5     
#> [37] glue_1.8.0          data.table_1.18.2.1 farver_2.1.2       
#> [40] codetools_0.2-20    rmarkdown_2.30      tools_4.5.3        
#> [43] pkgconfig_2.0.3     htmltools_0.5.9