Adaptation of the weighted Kaplan-Meier method to time-dependent ROC curves

This study was aimed at adapting the weighted Kaplan-Meier method to time-dependent ROC curve analysis. The performances of these two time-dependent ROC curve methods were compared, in which the Kaplan-Meier estimator and weighted Kaplan-Meier estimator were used. An application was presented for pancreatic cancer patients to evaluate the prognostic ability of the CA19-9 antigen. A simulation study was performed for different scenarios to see the performance of the proposed method. In all situations, it is observed that the AUC values that were obtained by the weighted time-dependent ROC (WTDR) curves more closely approximated the real AUC values than the classical time-dependent ROC (TDR) curve method and has got smaller mean square error rates.


INTRODUCTION
In classical ROC analysis, disease status is treated as a disease status can change over time, and those individuals who are not diseased can develop the disease during the study period.There can be a certain time lag between the time that the diagnostic test is conducted and the onset of the disease.How well a diagnostic test result, having been measured at the beginning of the study, can discriminate between diseased and undiseased individuals at a [0, t] follow-up time, is the question that must be addressed in such situations.
In literature there are several discussions of timedependent ROC curves (Etzioni et al., 1999 Journal of the National Science Foundation of Sri Lanka 46 (1) the longitudinal biomarker data by focusing on fully Bayesian hierarchical models and the latent disease process models.Cai et al. (2006) used generalised linear models to estimate time-dependent ROC (TDR) curve with incident sensitivity in censored data.They modelled the dependence in time by using vectors of polynomial or spline basis functions.
The present study was focused on the cumulative/ role of control for times t X, and then contributes as a case for times t > X, where X is the failure time.The term borrowing strength is typically used in Bayesian statistics and generally references an attempt to improve precision by using additional information from allied sources.
The weighted likelihood (WL) function has been designed to incorporate information from populations that are relevant, but not of prime inferential interest to the study population (Hu, 1994).The WL function suggested by Hu and Zidek (2002) is based on the result of James and Stein (1961) insofar as, in terms of the sum of the mean-square-error-of-estimation criterion, the sample averages could be improved upon by borrowing information from the other samples -the so-called Stein paradox (James & Stein, 1961;Hu & Zidek, 2002).Similar to the James-Stein estimator, a WL estimator that facilitates drawing inferences on one sample by using additional information from different populations was suggested by Hu and Zidek (2002).
Wang (2001), Wang et al. (2004), and Wang and Zidek (2005) used cross-validation procedure for adaptively choosing the weights and gave the analytical forms of the adaptive weights when the WL estimation is a linear combination of the maximum likelihood estimations.They estimated the WL whereby the data are regarded as samples from m populations, and proposed adaptive weights, which were allowed to depend on the data.If the F 1 , the weighted empirical distribution function to estimate it can be given as follows: ... (1) with and , where is the empirical distribution function related to the i th population and is the sample of individuals, drawn from the i th population.Plante (2008a) showed that WL can be derived from the entropy maximisation principle using the weighted empirical distribution function given above, and suggested minimum averaged mean squared error (MAMSE) weights.Plante (2008b;2009) used MAMSE weights for right-censored data and proposed adaptively weighted Kaplan-Meier estimates as nonparametric estimators for lifetime data, which borrows strength from m different populations to draw inferences for just one population of interest, that has a similar distribution to other m-1 populations.This article is aimed at using a weighted Kaplan-Meier estimator to obtain time-dependent ROC curves, which could handle right-censored data in determining the discrimination ability of a marker.

Let
be the death time for the j th individual in the i th population, be the censoring time for the j th individual of the i th population, and be the follow-up time; thus, if , then ( ) is observed for .The Kaplan-Meier estimate of the probability of survival beyond time t, which is a non-parametric estimator of the survival function S(t), can be written as below for the i th population (Kaplan & Meier, 1958) where , , . An optimisation problem in which the optimal weights can be obtained, which minimises the objective function given below under the constraints and where is and the weights are chosen to minimise .In the objective function the squared difference was required to be minimised so that weights that make close to should be selected.U is the upper limit, which is set smaller than the largest follow-up time.An algorithm for obtaining these optimal weights, which can be noted as , was also proposed (Plante, 2008b;2009).So the weighted Kaplan-Meier estimate for the probability of survival beyond time t can be given as ...(4) Various approaches have been proposed that can be used when the output variable of interest is an event that can take place at any time after the diagnostic test has been administered.Heagerty  ].For large values of c, the sample size for may be small for getting the conditional Kaplan-Meier estimate.In this paper, it is method (2009).The weighted conditional Kaplan-Meier estimate will typically be smoother since steps can occur at the times of failure from all the populations.By using the weighted Kaplan-Meier estimator instead of the survival function and by using the sample distribution function of Y can be written again as follows, respectively in equation ( 5) and equation (6); Journal of the National Science Foundation of Sri Lanka 46 (1) ... (5) ... (6) where, denotes failure status at any time t for the 1 st population with indicating that the subject has had an event prior to time t; is the weighted Kaplan-Meier estimator for the 1 st population, calculated by using data from m populations as ; is the conditional Kaplan-Meier estimator for the 1 st population calculated by using data from subsets of m populations for as ; and where .Here is the proposed estimator for the 1 st survival function, , which is the parameter of interest.So the weights are chosen to minimise the difference between and .Likewise, distribution function.The steps of the algorithm proposed by Plante et al. (2008a;2009), which is used to calculate weights, had been conducted as to never give 0 weight to the 1 st population since it is the population of interest.

Simulation methodology and scenarios
It is aimed at comparing the AUC values obtained from the time-dependent ROC curve using the Kaplan-Meier function, with the AUC values obtained by using the weighted Kaplan-Meier function.
For the number of populations m = 2, a variety of sample sizes (n 1 -n 2 continuous diagnostic test results were generated from by taking the correlation between the marker and the log(time) as and .Per convention, was taken to be negative so that the higher marker value indicates a smaller event time.Independent censoring times were generated from censored normal distribution as censoring rates.Censoring rates ( -) were taken as Kaplan-Meier function was calculated for the 1 st group using the measurements of two groups, and the ROC curve that uses the Kaplan-Meier function was calculated for the 1 st group using the measurements of the 1 st group.Simulation strategy used by Heagerty and Zheng (2005) and true positive (TP) values were estimated at these FP rates.The (FP, TP) pairs were estimated and then the TP rate corresponding to the given FP rate was interpolated for a given simulation.AUC values were calculated by the trapezoidal rule using these TP and FP pairs and then by averaging the AUC values over the number of simulations to get an estimate of the true AUC.TP, FP and AUC values were calculated using the survivalROC 1.0.3 package for the method which uses Kaplan-Meier estimator (Heagerty & Saha-Chaudhuri, 2013).ROC curves and AUC values were calculated by using the Kaplan-Meier and weighted Kaplan-Meier estimators for the 1 st sample.One thousand repetitions were performed for each scenario.Ampullary cancer (i.e., carcinoma of the ampulla of Vater) is a fairly rare pancreatic cancer that starts at the location where the bile duct and the pancreatic duct meet and empty into the duodenum (the ampulla of Vater).It has been aimed at assessing the value of the preoperative plasma CA 19-9 level in predicting the mortality of ampullary cancer patients and to examine the proper cutoff points for the CA 19-9 level by using weighted timedependent ROC analysis.AUC values and cut-off points for CA 19-9 for the ampullary cancer dataset (which has smaller sample size) were estimated by borrowing strength from the second dataset of pancreatic-head cancer patients.

Data analysis
To assess the performance of CA 19-9 across the study period, AUC(t) values were calculated for different t values with weighted time-dependent ROC (WTDR) and time-dependent ROC (TDR) methods.Bootstrapped variance and 95 % CIs were calculated for the AUC values from 500 bootstrap repetitions of the dataset.The null hypothesis that the AUC did not differ from 0.5 was tested.Cut-off values were determined by means AUC (t) values.Analyses were performed using R 3.3.0software (R Core Team, 2013).

RESULTS AND DISCUSSION
By using the WTDR method, CA 19-9 was found to be st and 35 th months.At early times, the cut-off for the CA 19-9 level was 192, but after one and a half years, this cutoff moved to 138.However, by using the TDR method, CA 19-9 was found th to the 26 th months.
variations according to different time points for CA 19-9 (Tables 3 and 4).
In the present study, weighted time-dependent ROC curves that integrated additional data from different populations by using a weighted Kaplan-Meier estimator was presented.From the simulation studies, it was shown that ROC curves that were obtained by using weighted Kaplan-Meier method were closer to the real ROC curves and also the AUC values obtained by using WTDR produced MSE, SEM and SD values that were smaller than those of the TDR curves calculated from the Kaplan-Meier function for all sample sizes and censoring rates.
As expected, MSE values decreased as sample sizes increased; however, MSE values also decreased as the correlation between the marker value and the survival time increased.Better results were obtained for the situation where than for both of the methods.Additionally, as the censoring rates increased, it is observed that the MSE, SEM and SD values also increased, both for the AUC values obtained by using the weighted Kaplan-Meier function and for the AUC values obtained by using the classical Kaplan-Meier function.For the time-dependent ROC curves, which were obtained by using the weighted Kaplan-Meier function, ) had a much larger effect than the censoring rate of the second group (c 2 ) on the increment of the MSE.However, for the same sample sizes and the same censoring rates for group 1, the weighted time-dependent ROC curves always yielded smaller MSE, SEM and SD values regardless of the value 2 ).Moreover, the differences between the MSE, SEM and SD values for the WTDR and the TDR methods became more apparent as c 1 increased from 0.40 to 0.60.In large sample sizes, an improvement was seen in the performances of both methods; however the MSE, SEM and SD values were still smaller for the weighted time-dependent ROC curves.
; Heagerty et al., 2000; Slate & Turnbull, 2000; Cai et al., 2003; 2006; Heagerty & Zheng, 2005; Chambless & Diao, 2006; Uno et al., 2007; Hung & Chiang, 2010a; Martínez-Camblor et al., 2016).Cumulative/dynamic, incident/static and incident/dynamic estimators for and also discussed by Cai et al. (2006) and Pepe et al. (2008).Sensitivity can be estimated using cases that up to time t estimated using all individuals who are not cases at time t et al., 1999; Slate & Turnbull, 2000; Heagerty & Zheng, 2005; Cai et al., 2006), and two methods were proposed by Etzioni et al ROC curve given estimates of the longitudinal model parameters by utilising random-effects models to capture the correlation between within-subject measurements.The second one is based on estimating the ROC curve at any time of interest, by setting the time covariate to a March 2018

Table 1 :
AUC values obtained for for different censoring rates and sample sizes (1)the National Science Foundation of Sri Lanka 46(1)March 2018Journal of the National Science Foundation of Sri Lanka 46(1)

Table 2 :
AUC values obtained for for different censoring rates and sample sizes (1)rnal of the National Science Foundation of Sri Lanka 46(1) Figure 1: Cumulative/dynamic ROC curves for different censoring rates and different rho values (WKME: time-dependent ROC curves which use weighted Kaplan-Meier estimation, KME: time-dependent ROC curves which use Kaplan-Meier estimation, c 1 : censoring rate for group 1, c 2 : censoring rate for group 2).March 2018 Journal of the National Science Foundation of Sri Lanka 46(1)