Directly using information diffusion cascade data, our frameworks can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities. Connections between parameter learning and optimal control are also established, leading to a rigorous and implementable algorithm for training NMF. Moreover, we show that the projected gradient descent method can be employed to solve the challenging influence maximization problem, where the gradient is computed extremely fast by integrating NMF forward in time just once in each iteration. Extensive empirical studies show that our approach is versatile and robust to variations of the underlying diffusion network models, and significantly outperform existing approaches in accuracy and efficiency on both synthetic and real-world data.

]]>selective pressure from their microenvironment. Such a pressure promotes the diversification of both tumor cells and the tumor microenvironment, resulting in increased intratumoral heterogeneity (ITH) that enables aggressive disease progression leading to metastasis and resistance to treatment. Metastasis and the emergence of chemo-resistance are the two main reasons for cancer treatment failure. In this work we focus on developing mathematical models to understand cancer evolution leading to metastasis and chemo-resistance with a special focus on the role of ITH. Our central goal is to understand the evolution of phenotypic heterogeneity as tumor cells

adaptation to various environments. We use a multiscale model to systematically study cancer metastasis and make connections to potential clinical implications for optimizing screening and treatment schedules. At the cell level, we use a cell-based model (the Cellular Potts Model or CPM) to simulate the collective cancer invasion. At the population level, we use continuous replicator dynamics to analyze the adaptation strategies of the tumor. This work reveals how the pairwise interactions between phenotypes within the tumor, together with the microenvironments, alter the dynamics of the tumor progression and change their responses to chemotherapy. The study will offer potential clinical prognosis information and treatment strategies for patients.

]]>We also apply the empirical likelihood method in two different kinds of survival data. First, we consider panel count data. In panel count data, each study subject can only be observed at discrete time points rather than continuously. The total number of events between the two observation times are known, but the exact time of events is unknown. Furthermore, the observation times can be different among subjects and carry important information about the underlying recurrent process. The second dataset comes from cohort study data. Collecting covariate information on all study subjects makes cohort studies very expensive. One way to reduce the cost while keeping sufficient covariate information is to use a case-cohort study design. We consider case-cohort data to make inferences about the regression parameters of semiparametric transformation models. For both datasets, an empirical likelihood ratio is formulated, and the Wilks' theorem is established.

Extensive simulation studies are carried out to assess all the methods mentioned earlier in various data settings. We compare the performance in terms of coverage probabilities and average lengths by NA and EL methods' confidence intervals. The applicability of the methods is also illustrated by real datasets.

]]>We propose a new plug-in approach of JEL to reduce the computational cost in comparing two Gini indices for paired data. One of the main results of the EL is the nonparametric extension of Wilks' theorem for parametric likelihood ratios. However, this result is violated when the data is censored. To circumvent this issue for some specific parameters, we combine the EL method with the influence functions (IID EL) to construct a confidence interval for the mean residual life (MRL) function in the presence of length-bias. Further, we extend the IID EL to the two-sample mean difference, where the two samples considered are right-censored. Last, we consider the weighted empirical likelihood (WEL) for comparing two correlated areas under the ROC curves (AUC).

For the first three essays, we proved that Wilks' theorem holds: the log-likelihood ratio statistic is asymptotically chi-square distributed. And the WEL statistic has a scaled chi-squared distribution. The extensive simulations demonstrated that, for finite samples, all the proposed methods outperform the existing EL methods in coverage probability accuracy and average lengths of CI. Finally, the application to real data demonstrated that the proposed methods are of practical value.

]]>In both two and three (the normal healthy stage, the early stage of the disease, and the stage of the full development of the disease) diagnostic classes studies, we propose a new influence function-based empirical likelihood method and Bayesian empirical likelihood methods. The proposed methods are shown to perform better than the existing methods in terms of both coverage probability and interval length in simulation studies. A real data set from Alzheimer's Disease Neuroimaging Initiative (ANDI) is analyzed by using the newly proposed methods.

In two-phase diagnostic studies with both screening test and gold standard test, verification of the true disease status might be partially missing based on the results of diagnostic tests and other subjects' characteristics. Because the estimators of AUC based on partially validated subjects are usually biased, it is usually necessary to estimate AUC by bias-corrected methods. We proposed direct estimators of the AUC based on hybrid imputation(FI and MSI), inverse probability weighting (IPW), and the semi-parametric efficient(SPE) approach with verification biased data when the test result is continuous under the assumption that the true disease status, if missing, is missing at random (MAR). Simulation results show that the proposed estimators are accurate for the biased sampling. We illustrate the proposed methods with a real data set of Neonatal Hearing Screening study.

]]>The latter half of this dissertation focuses on empirical likelihood (EL) based interval estimation methods for correlation coefficient (CC) and coefficient of variation (CV). Under normal distribution assumptions, there are many types of confident intervals for CC or CV, such as the GPQ-based ‘exact’ interval, the Z transformation-based interval, and maximum likelihood-based intervals. However, the exact method is computationally cumbersome, and approximation methods can't be applied when the underlying distribution is unknown. Therefore, we propose influence function-based empirical likelihood intervals for CC and CV. Extensive simulation studies are conducted to evaluate the finite sample performances of the proposed EL-based intervals in terms of coverage probability. Finally, we illustrate the proposed methods with real examples.

]]>For the diagnostic medicine portion of this dissertation, we propose empirical likelihood (EL) inference procedures for two motivating statistical measures in biomedical research, i.e., the two-way partial AUC (tpAUC) and the sensitivity to the early disease stage. The EL procedure does not require underlying distributional assumption and is very suitable for drawing the inference about parameters. The area under the curve (AUC) is a summary measure for the receiver operating characteristic (ROC) curve. There have been no good confidence intervals proposed for the tpAUC. Motivated by this lacking, we propose EL confidence interval for the tpAUC and the difference between two tpAUCs. We also propose EL confidence interval for the sensitivity to the early disease stage and for the difference between two sensitivities to early disease stages. The early disease stage can play a vital role for the therapeutic intervention and prevention potentiality. Better inference procedure for the sensitivity can ensure the identification of better performing biomarkers. Our extensive simulation studies suggested good performance of the proposed procedures compared to the existing methods. Finally, real data sets are analyzed for illustration of the proposed methods.

]]>In the real world, there is a large proportion of patients having zero costs. In the second part, we propose to use fiducial quantity and EL-based inference for the mean of zero-inflated censored medical costs applying the method of variance estimates recovery (MOVER). We also provide EL-based confidence intervals for the upper quantile censored medical costs with many zero observations. Simulation studies are conducted to compare the performance between proposed EL-based methods and the existing normal approximation-based methods in terms of coverage probability. The novel EL-based methods are observed to have better finite sample performances than existing methods, especially when the censoring proportion is high.

In the third part of this dissertation, we focus on evaluating breast cancer recurrence risk. For early-stage cancer tumor recurrence study, existing methods do not have an overall powerful survival prediction ability. Preliminary studies show that centrosome amplification has a strong latent correlation with tumor progression. As a result, we propose to construct a novel quantitative centrosome amplification score to stratify patients' cancer recurrence risk. We prove that patients with higher centrosome amplification score will have a significantly higher probability to experience cancer recurrence given all demographic conditions, which could provide a potent reference for the future developing trend of early-stage breast cancer.

]]>