Introduction

This is an annotated copy of the .Rmd file that generated my slides for the talk at the Clinical Trials Conference at the University of Pennsylvania, April 17 2023. It is intended to be useful for anyone who wants to replicate the analyses done there.

Backdrop

  • “A single yes/no is the least amount of information that can be gleaned from a patient”. Charlie Odoroff
  • “A single yes/no p-value is the least amount that can be learned from a research study”.
  • Learn as much as you can

Multi-state models

  • Natural extension of the PH model
  • They help me to glean more understanding about the disease process under study
  • Disease processes are multi-factorial, the path is much more interesting than any single way station.
  • Easy to use

The colon cancer data set is found in the survival package. Each subject has 2 rows, one for time to recurrence, the other for time to death. Split this into two data sets, and then create the progression free survival time.

Colon cancer data

  • 929 patients: control, levamiole, levamisole + 5FU
  • time to recurrence and death
  • HR for progression free survival (PFS): 1, .97, .62 (.50-.78)

Make the multi-state data set. Each subject is described by a set of rows with variable of subject id, time1, time2, state, and covariates. The primary rule for such a data set is that each the rows for each subject describe a physically possible path through the states. - no overlapping intervals (can’t be two places at once) - no gaps in the time course (have to be somewhere) - if you enter a state, the time in state is >0 (no teleporting)

The tmerge routine is a tool to help create such data sets, but it is not necessary. Other than the state or “status” variable, the data has the same form as timed-dependent covariate data sets; many have their own process for creating those. The survcheck routine checks above conditions, and gives a table of the observed transitions.

Call:
survcheck(formula = Surv(time1, time2, state) ~ 1, data = cdata, 
    id = id)

Unique identifiers       Observations        Transitions 
               929               1390                915 

Transitions table:
       to
from    recur death (censored)
  (s0)    463    43        423
  recur     0   409         52
  death     0     0          0

Number of subjects with 0, 1, ... transitions to each state:
       count
state     0   1   2
  recur 466 463   0
  death 477 452   0
  (any) 423  97 409

The first table tells the observed transtions in the data, each row of the second how many subjects made 0, 1, … transitions into that state.

The statefig routine draws simple state space figures. Those who want prettier ones can employ one of the many routines for DAGs. (The annotation below is about as fancy as you can get with this routine.)

Fit the multistate model, and display the coefficients as a graph. (Fitting is easy, all the work was setting up the data set.) For those who know ggplot: feel free to make a much nicer figure.

Call:
coxph(formula = Surv(time1, time2, state) ~ lev5 + age10 + extent + 
    node4, data = cdata, id = id)

        
1:2          coef exp(coef) se(coef) robust se      z        p
  lev5   -0.52113   0.59385  0.10739   0.10935 -4.766 1.88e-06
  age10  -0.04278   0.95813  0.03915   0.04199 -1.019    0.308
  extent  0.53143   1.70137  0.11399   0.12263  4.334 1.47e-05
  node4   0.82599   2.28413  0.09671   0.10028  8.237  < 2e-16

        
1:3         coef exp(coef) se(coef) robust se     z        p
  lev5   0.08992   1.09409  0.31145   0.31975 0.281   0.7785
  age10  0.95268   2.59264  0.18940   0.18190 5.237 1.63e-07
  extent 0.47976   1.61568  0.36679   0.46724 1.027   0.3045
  node4  0.80634   2.23969  0.33753   0.35185 2.292   0.0219

        
2:3         coef exp(coef) se(coef) robust se     z        p
  lev5   0.23120   1.26011  0.11413   0.12111 1.909   0.0563
  age10  0.09618   1.10096  0.03989   0.04391 2.190   0.0285
  extent 0.09959   1.10472  0.10598   0.11439 0.871   0.3839
  node4  0.40838   1.50438  0.10455   0.10256 3.982 6.83e-05

 States:  1= (s0), 2= recur, 3= death 

Likelihood ratio test=188.9  on 12 df, p=< 2.2e-16
n= 1390, number of events= 915 

Absolute risk

  • Hazard ratios are not enough
  • \(p_k(t; x)\) = probability in state \(k\) at time \(t\)
  • \(E(N_k(t); x)\) = number of visits to state \(k\)
    • P(ever visit state \(k\))
  • E(sojourn time in each state) up to time \(t\)
  • E(sojourn, per vist)

The set of probability in state curves for a fitted multistate PH model can be generated for any target set of covariates. (Predictions are always for a hypothetical someone). Below, I dropped age to make things a bit simpler. Then there is a trick: if we exclude any observation time after recurrence (lines with td.recur==1), then the model is a competing risks model and the probability in state for recurrence will be “ever recurred”. It is an easy way to get the probability of ever recurring.

The objects csurv2 and csurv will each have the curves for 16 subjects, the number of rows in data set dummy; each curve has information on 3 states of (s0), recurrence and death. (If you don’t provide an explicit name for the initial state, and I didn’t, it uses (s0) as a generic label.) The intermediate printing below was not shown on the slides.

  data states 
    16      3 
             n nevent     rmean
1, (s0)   1390      0 6.5680067
2, (s0)   1390      0 7.0445011
3, (s0)   1390      0 5.7484689
4, (s0)   1390      0 6.4688788
5, (s0)   1390      0 4.6183596
6, (s0)   1390      0 5.6179260
7, (s0)   1390      0 3.2591642
8, (s0)   1390      0 4.4647544
9, (s0)   1390      0 5.1645521
10, (s0)  1390      0 6.0429096
11, (s0)  1390      0 3.8857981
12, (s0)  1390      0 5.0234830
13, (s0)  1390      0 2.5047500
14, (s0)  1390      0 3.7340445
15, (s0)  1390      0 1.3681868
16, (s0)  1390      0 2.3763783
1, recur  1390    463 0.4864469
2, recur  1390    463 0.2532152
3, recur  1390    463 0.6983250
4, recur  1390    463 0.3716350
5, recur  1390    463 0.9418232
6, recur  1390    463 0.5227841
7, recur  1390    463 1.1579902
8, recur  1390    463 0.6891639
9, recur  1390    463 0.6936849
10, recur 1390    463 0.3729134
11, recur 1390    463 0.8824530
12, recur 1390    463 0.5039921
13, recur 1390    463 1.0037567
14, recur 1390    463 0.6281135
15, recur 1390    463 0.9977164
16, recur 1390    463 0.6989545
1, death  1390    452 0.9455464
2, death  1390    452 0.7022837
3, death  1390    452 1.5532061
4, death  1390    452 1.1594862
5, death  1390    452 2.4398172
6, death  1390    452 1.8592899
7, death  1390    452 3.5828457
8, death  1390    452 2.8460817
9, death  1390    452 2.1417630
10, death 1390    452 1.5841771
11, death 1390    452 3.2317489
12, death 1390    452 2.4725250
13, death 1390    452 4.4914933
14, death 1390    452 3.6378420
15, death 1390    452 5.6340969
16, death 1390    452 4.9246672
[1] 0.02260495 0.11410118 0.81700753 0.04628633

Now draw the barplot without collapsing over extent of disease.

Make a matrix out of the sojourn times, and print it out.

                   Sojourn(initial) Sojourn(recur) P(recur) recur duration
Trt, nodes <=4                 5.69           0.51     0.33           1.55
Control, nodes <=4             4.73           0.91     0.49           1.88
Control, nodes >4              2.67           0.98     0.76           1.29
Trt, nodes >4                  3.87           0.61     0.58           1.05

For the curves below I got lazy. Over 80% of the subjects had extent=3, so I draw the curves for extent =3 rather than margining it out. The yates() routine is designed do the busywork for marginal estimates, but currently doesn’t work with mutltistate fits.

Myeloid Data

  • Analysis based on CALGB trial of patients with acute myeloid leukemia with FLT3 mutation
  • Chemo \(\pm\) midostaurin arms
  • Primary analysis: stratified log-rank test
  • Modified/blinded dataset

The person below with stem cell transplant on the same day as CR has to be modified, to prevent a “0 days in the CR state” error. Making CR 1 day earlier solves this, but is completely arbitrary. The decision not to count CR after SCT is a more nuanced one. One of the main discussions in the talk was CR and duration of CR, compraring arms A and B. If someone has had stem cell transplant, however, should that CR be credited to the earlier treatment, or is it entirely due to SCT? Which way to go is much more a medical question than a statistical one. This data set is also discussed in the main survival vigette as an example of Aalen-Johansen curves, and in that section I kept these 18 CR.

Call:
survcheck(formula = Surv(time1, time2, event) ~ 1, data = mdata, 
    id = id)

Unique identifiers       Observations        Transitions 
               646               1689               1353 

Transitions table:
         to
from       CR SCT relapse death (censored)
  (s0)    443 106      13    55         29
  CR        0 159     164    15        105
  SCT       0   0      49   151        163
  relapse   0  99       0    99         28
  death     0   0       0     0          0

Number of subjects with 0, 1, ... transitions to each state:
         count
state       0   1   2   3  4
  CR      203 443   0   0  0
  SCT     282 364   0   0  0
  relapse 420 226   0   0  0
  death   326 320   0   0  0
  (any)    29 206 173 151 87

The variable crstat is used for the analysis that treats CR and death as competing risks, txstat for treating transplant and death as competing risks, and tx2 for the second transplant analysis. One should run survcheck on each endpoint to verify it.

         to
from       CR SCT relapse death (censored)
  Entry   443 106      13    55         29
  CR        0 159     164    15        105
  SCT       0   0      49   151        163
  relapse   0  99       0    99         28
  death     0   0       0     0          0

The layout() command is an old way to create a montage of plots. I use it here to allow the state space figures to be on the same page as the survival plots.

Standard errors

  • pstate = \(p_k(t; x) =\) P(in state \(k\) at time \(t\), covariates \(x\))
  • \(J(t; x)\) = infinitesimal jackknife = effect of obs \(i\) on \(p(t)\)
    • (number of observations) x (number of time points) x (number of states)
    • std for everything else follows

Summary

  • This is a useful addition to your toolkit
  • Take the time to look deeper into your data
    • Resist the ``tyranny of the urgent’’
  • Hazards are important, hazards are not enough
  • Software
    • all done with the survival package in R
    • ongoing work to make it even easier