{"title":"Homogeneity Tests for High-dimensional Mean Vectors and Covariance Matrices","authors":"Wenwen Guo, Xinyuan Song, H. Cui","doi":"10.5705/ss.202022.0048","DOIUrl":"https://doi.org/10.5705/ss.202022.0048","url":null,"abstract":"Homogeneity Tests","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen Su, Li Liu, Guosheng Yin, Xingqiu Zhao, Ying Zhang
We study semiparametric regression for a recurrent event process with an informative terminal event, where observations are taken only at discrete time points, rather than continuously over time. To account for the effect of a terminal event on the recurrent event process, we propose a semiparametric reversed mean model, for which we develop a two-stage sieve likelihood-based method to estimate the baseline mean function and the covariate effects. Our approach overcomes the computational difficulties arising from the nuisance functional parameter in the assumption that the likelihood is based on a Poisson process. We establish the consistency, convergence rate, and asymptotic normality of the proposed two-stage estimator, which is robust against the assumption of an underlying Poisson process. The proposed method is evaluated using extensive simulation studies, and demonstrated using panel count data from a longitudinal healthy longevity study and data from a bladder tumor study.
{"title":"SEMIPARAMETRIC REVERSED MEAN MODEL FOR RECURRENT EVENT PROCESS WITH INFORMATIVE TERMINAL EVENT.","authors":"Wen Su, Li Liu, Guosheng Yin, Xingqiu Zhao, Ying Zhang","doi":"10.5705/ss.202021.0353","DOIUrl":"10.5705/ss.202021.0353","url":null,"abstract":"<p><p>We study semiparametric regression for a recurrent event process with an informative terminal event, where observations are taken only at discrete time points, rather than continuously over time. To account for the effect of a terminal event on the recurrent event process, we propose a semiparametric reversed mean model, for which we develop a two-stage sieve likelihood-based method to estimate the baseline mean function and the covariate effects. Our approach overcomes the computational difficulties arising from the nuisance functional parameter in the assumption that the likelihood is based on a Poisson process. We establish the consistency, convergence rate, and asymptotic normality of the proposed two-stage estimator, which is robust against the assumption of an underlying Poisson process. The proposed method is evaluated using extensive simulation studies, and demonstrated using panel count data from a longitudinal healthy longevity study and data from a bladder tumor study.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":"1843-1862"},"PeriodicalIF":1.2,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12291165/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: In this paper, we propose an outlier detection procedure, based on a high-breakdown minimum ridge covariance determinant estimator that is especially useful for the large p/n scenario. The estimator is obtained from the subset of observations, after excluding potential outliers, by applying the so-called concentration steps. We explore the asymptotic distribution of the modified Mahalanobis distance related to the proposed estimator under certain moment conditions, and obtain a theoretical cutoff value for outlier identification. We also improve the outlier detection power by adding a one-step reweighting procedure. Lastly, we investigate the performance of the proposed methods using simulations and a real-data analysis.
{"title":"Outlier Detection via a Minimum Ridge Covariance Determinant Estimator","authors":"Chikun Li, B. Jin, Yuehua Wu","doi":"10.5705/ss.202022.0142","DOIUrl":"https://doi.org/10.5705/ss.202022.0142","url":null,"abstract":": In this paper, we propose an outlier detection procedure, based on a high-breakdown minimum ridge covariance determinant estimator that is especially useful for the large p/n scenario. The estimator is obtained from the subset of observations, after excluding potential outliers, by applying the so-called concentration steps. We explore the asymptotic distribution of the modified Mahalanobis distance related to the proposed estimator under certain moment conditions, and obtain a theoretical cutoff value for outlier identification. We also improve the outlier detection power by adding a one-step reweighting procedure. Lastly, we investigate the performance of the proposed methods using simulations and a real-data analysis.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Efficiency of Composite Likelihood Estimation for Gaussian Spatial Processes","authors":"N. Chua, Francis K. C. Hui, A. Welsh","doi":"10.5705/ss.202020.0311","DOIUrl":"https://doi.org/10.5705/ss.202020.0311","url":null,"abstract":"the Efficiency of Composite Likelihood","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70936712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: In comparative studies, researchers often seek an optimal covariate balance. However, chance imbalance still exists in randomized experiments, and becomes more serious as the number of covariates increases. To address this issue, we introduce a new randomization procedure, called adaptive randomization via the Mahalanobis distance (ARM). The proposed method allocates units sequentially and adaptively, using information on the current level of imbalance and the incoming unit’s covariate. Theoretical results and numerical comparison show that with a large number of covariates or a large number of units, the proposed method shows substantial advantages over traditional methods in terms of the covariate balance, estimation accuracy, hypothesis testing power, and computational time. The proposed method attains the optimal covariate balance, in the sense that the estimated treatment effect attains its minimum variance asymptotically, and can be applied in both causal inference and clinical trials. Lastly, numerical stud-1
{"title":"Adaptive Randomization via Mahalanobis Distance","authors":"Yichen Qin, Y. Li, Wei Ma, Haoyu Yang, F. Hu","doi":"10.5705/ss.202020.0440","DOIUrl":"https://doi.org/10.5705/ss.202020.0440","url":null,"abstract":": In comparative studies, researchers often seek an optimal covariate balance. However, chance imbalance still exists in randomized experiments, and becomes more serious as the number of covariates increases. To address this issue, we introduce a new randomization procedure, called adaptive randomization via the Mahalanobis distance (ARM). The proposed method allocates units sequentially and adaptively, using information on the current level of imbalance and the incoming unit’s covariate. Theoretical results and numerical comparison show that with a large number of covariates or a large number of units, the proposed method shows substantial advantages over traditional methods in terms of the covariate balance, estimation accuracy, hypothesis testing power, and computational time. The proposed method attains the optimal covariate balance, in the sense that the estimated treatment effect attains its minimum variance asymptotically, and can be applied in both causal inference and clinical trials. Lastly, numerical stud-1","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70936861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is concerned with event durations in situations where the study units may be spatially correlated and the time origins of the events are missing. We develop regression models based on the partly observed durations with the aid of available longitudinal information. The first-hitting-time model (e.g. Lee and Whitmore, 2006) is employed to link the data of event durations and the associated longitudinal measures with shared random effects. We present procedures for estimating the model parameters and an induced estimator of the conditional distribution of the event duration. We apply the EM algorithm and Monte Carlo methods to compute the proposed estimators. We establish consistency and asymptotic normality of the estimators, and present their variance estimation. The proposed approach is illustrated with a collection of wildfire records from Alberta, Canada. Its performance is examined numerically and compared with two competitors via simulation.
{"title":"Regression Analysis of Spatially Correlated Event Durations With Missing Origins Annotated by Longitudinal Measures","authors":"Y. Xiong, W. J. Braun, T. Duchesne, X. J. Hu","doi":"10.5705/ss.202021.0118","DOIUrl":"https://doi.org/10.5705/ss.202021.0118","url":null,"abstract":"This paper is concerned with event durations in situations where the study units may be spatially correlated and the time origins of the events are missing. We develop regression models based on the partly observed durations with the aid of available longitudinal information. The first-hitting-time model (e.g. Lee and Whitmore, 2006) is employed to link the data of event durations and the associated longitudinal measures with shared random effects. We present procedures for estimating the model parameters and an induced estimator of the conditional distribution of the event duration. We apply the EM algorithm and Monte Carlo methods to compute the proposed estimators. We establish consistency and asymptotic normality of the estimators, and present their variance estimation. The proposed approach is illustrated with a collection of wildfire records from Alberta, Canada. Its performance is examined numerically and compared with two competitors via simulation.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"24 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The conventional method for functional quantile regression (FQR) is to fit the regression model for each quantile of interest separately. Therefore, the slope function of the regression, as a bivariate function of time and quantile, is estimated as a univariate function of time for each fixed quantile. However, there are several limitations to this conventional strategy. For example, it cannot guarantee the monotonicity of the conditional quantiles, nor can it control the smoothness of the slope estimator as a bivariate function. In this paper, we propose a new framework for FQR, in which we simultaneously fit the FQR model for multiple quantiles, with the help of a bivariate basis under some constraints, such that the estimated quantiles satisfy the monotonicity conditions and the smoothness of the slope estimator is controlled. The proposed estimator for the slope function is shown to be asymptotically consistent, and we establish its asymptotic normality. We use simulation to evaluate the finite-sample performance of the proposed method and compare it with that of the conventional method. We demonstrate the proposed method by analyzing the effects of Statistica Sinica: Preprint doi:10.5705/ss.202021.0248
{"title":"Simultaneous Functional Quantile Regression","authors":"Boyi Hu, Xixi Hu, Hua Liu, Jinhong You, Jiguo Cao","doi":"10.5705/ss.202021.0248","DOIUrl":"https://doi.org/10.5705/ss.202021.0248","url":null,"abstract":"The conventional method for functional quantile regression (FQR) is to fit the regression model for each quantile of interest separately. Therefore, the slope function of the regression, as a bivariate function of time and quantile, is estimated as a univariate function of time for each fixed quantile. However, there are several limitations to this conventional strategy. For example, it cannot guarantee the monotonicity of the conditional quantiles, nor can it control the smoothness of the slope estimator as a bivariate function. In this paper, we propose a new framework for FQR, in which we simultaneously fit the FQR model for multiple quantiles, with the help of a bivariate basis under some constraints, such that the estimated quantiles satisfy the monotonicity conditions and the smoothness of the slope estimator is controlled. The proposed estimator for the slope function is shown to be asymptotically consistent, and we establish its asymptotic normality. We use simulation to evaluate the finite-sample performance of the proposed method and compare it with that of the conventional method. We demonstrate the proposed method by analyzing the effects of Statistica Sinica: Preprint doi:10.5705/ss.202021.0248","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70937519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Current-status data occur in many areas, and the analysis of such data attracted much attention. In this study, we consider a regression analysis of current-status data in the presence of informative censoring, for which most existing methods either apply only to limited situations or are computationally unstable. Here, we propose a new sieve maximum likelihood estimation procedure under the class of semiparametric generalized odds rate frailty models. The proposed method uses the latent variable to describe the informative censoring or relationship between the failure time of interest and the censoring time. We develop a novel expectation-maximization algorithm for determining the proposed estimators, and establish their asymptotic consistency and normality. The results of a simulation study show that the proposed method performs well in practical
{"title":"Generalized Odds Rate Frailty Models for Current Status Data with Informative Censoring","authors":"Yang Xu, Shishun Zhao, T. Hu, Jianguo Sun","doi":"10.5705/ss.202021.0411","DOIUrl":"https://doi.org/10.5705/ss.202021.0411","url":null,"abstract":": Current-status data occur in many areas, and the analysis of such data attracted much attention. In this study, we consider a regression analysis of current-status data in the presence of informative censoring, for which most existing methods either apply only to limited situations or are computationally unstable. Here, we propose a new sieve maximum likelihood estimation procedure under the class of semiparametric generalized odds rate frailty models. The proposed method uses the latent variable to describe the informative censoring or relationship between the failure time of interest and the censoring time. We develop a novel expectation-maximization algorithm for determining the proposed estimators, and establish their asymptotic consistency and normality. The results of a simulation study show that the proposed method performs well in practical","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel procedure for estimating the mean function of longitudinal imaging data with inherent spatial and temporal correlation. We depict the dependence between temporally ordered images using a functional moving average, and use flexible bivariate splines over triangulations to handle the irregular domain of images which is common in imaging studies. We establish both the global and the local asymptotic properties of the bivariate spline estimator for the mean function, with simultaneous confidence corridors (SCCs) as a theoretical byproduct. Under some mild conditions, the proposed estimator and its accompanying SCCs are shown to be consistent and oracle efficient, as though all images were entirely observed without errors. We use Monte Carlo simulation experiments to demonstrate the finite-sample performance of the proposed method, the results of which strongly corroborate the asymptotic theory. The proposed method is further illustrated by analyzing two seawater potential temperature data sets.
{"title":"Statistical Inference for Mean Function of Longitudinal Imaging Data over Complicated Domains","authors":"Qirui Hu, Jie Li","doi":"10.5705/ss.202021.0415","DOIUrl":"https://doi.org/10.5705/ss.202021.0415","url":null,"abstract":"We propose a novel procedure for estimating the mean function of longitudinal imaging data with inherent spatial and temporal correlation. We depict the dependence between temporally ordered images using a functional moving average, and use flexible bivariate splines over triangulations to handle the irregular domain of images which is common in imaging studies. We establish both the global and the local asymptotic properties of the bivariate spline estimator for the mean function, with simultaneous confidence corridors (SCCs) as a theoretical byproduct. Under some mild conditions, the proposed estimator and its accompanying SCCs are shown to be consistent and oracle efficient, as though all images were entirely observed without errors. We use Monte Carlo simulation experiments to demonstrate the finite-sample performance of the proposed method, the results of which strongly corroborate the asymptotic theory. The proposed method is further illustrated by analyzing two seawater potential temperature data sets.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Estimating the prevalence of an infectious disease in a big population typically requires testing a specimen (e.g., blood, urine, or swab) for the disease. When the disease spreads quickly, time constraints and limited resources often restrict the number of tests that can be performed. In such cases, if the prevalence is not too high, the group testing procedure can be employed to save time, money, and resources. The procedure tests pooled specimens of groups of individuals, rather than testing each individual for the disease. This technique is also used in other contexts, for example, to detect abnormalities or contamination in animals, plants, food, or water. Although methods exist for estimating a prevalence conditional on the explanatory variables from the group testing data, they require the specimen to be available for all individuals, which is not always possible. Therefore, we construct new nonparametric estimators that are consistent when some of the specimens are missing. We demonstrate the numerical performance of our methods using simulations and a hepatitis B example.
{"title":"Group Testing Regression Analysis with Missing Data and Imperfect Tests","authors":"A. Delaigle, Ruoxu Tan","doi":"10.5705/ss.202021.0382","DOIUrl":"https://doi.org/10.5705/ss.202021.0382","url":null,"abstract":": Estimating the prevalence of an infectious disease in a big population typically requires testing a specimen (e.g., blood, urine, or swab) for the disease. When the disease spreads quickly, time constraints and limited resources often restrict the number of tests that can be performed. In such cases, if the prevalence is not too high, the group testing procedure can be employed to save time, money, and resources. The procedure tests pooled specimens of groups of individuals, rather than testing each individual for the disease. This technique is also used in other contexts, for example, to detect abnormalities or contamination in animals, plants, food, or water. Although methods exist for estimating a prevalence conditional on the explanatory variables from the group testing data, they require the specimen to be available for all individuals, which is not always possible. Therefore, we construct new nonparametric estimators that are consistent when some of the specimens are missing. We demonstrate the numerical performance of our methods using simulations and a hepatitis B example.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}