Research

Research Funding:

NSERC Discovery Grant (April 2022 - March 2027)
Project Title: Robust and efficient methods for analyzing complex longitudinal and survival data

NSERC Discovery Grant (April 2016 - March 2022) [completed]
Project Title: Statistical methods for complex clinical and survey data

NSERC Discovery Grant (April 2011 - March 2016) [completed]
Project Title: Statistical methods for clinical research

Research Projects:

Missing data analysis

Missing data are common in many clinical experiments. For example, in a longitudinal study, an individual's outcomes can be missing at one follow-up time, and be measured at the next follow-up time, resulting in a class of nonmonotone missingness patterns. If the probability that the outcomes are missing at a given time depends on the values of the (possibly unobserved) outcomes at that time, then the missing-data mechanism is nonignorable. Currently, I am involved in a project on AIDS data analysis with Stuart Lipsitz (Harvard Medical School) and Garrett Fitzmaurice (Harvard Medical School) in which the goal is to develop a suitable method for analyzing longitudinal data on CD4 counts from HIV-infected patients. A full likelihood approach is complicated algebraically, and requires extensive computation since there are many follow-up times. We explore a pseuo-likelihood approach based on a bivariate binary regression model for analyzing the data. I visited Nan Laird (Harvard School of Public Health) for two months in the Summer of 2009, and worked on a project jointly with Nan Laird and Garrett Fitzmaurice. The project involves the analysis of multiple binary outcomes using a multivariate logistic regression model with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data.

Proteomics data analysis

Recently, I visited Wing Wong (Department of Statistics, Stanford University) during August-November 2009, and became involved in a project on proteomics data analysis. In this project, we consider analyzing longitudinal data on peptide (a component of a protein) intensities obtained from a group of patients. Data were also collected from a control group of healthy individuals. The goal is to determine proteins that are differentially expressed over time, and also to identify proteins that are differentially expressed in the patient and control groups. The longitudinal data obtained from the patient group contained many nonmonotone missing responses. These missing data are assumed to be nonignorbale, since the missingness may be due to the low abundance of the peptide intensity. We consider analyzing the data using a linear mixed model with a block-diagonal covariance structure. Since the large dataset involves thousands of peptides, and the full likelihood approach is computationally intensive, we consider using an approximation to the likelihood approach for analyzing the data. We use q-values (rather than standard p-values) for multiple tests of hypotheses to identify differentially expressed proteins. The proposed method may also be useful in the microarray analysis where longitudinal data on gene expressions can be obtained from a group of individuals. Here data may be missing due to low gene expression levels, and one needs to incorporate a nonignorable missing-data mechanism for analyzing such data.

Robust small area estimation

This is a continuation of the project on small area estimation with Jon Rao (Carleton University). Recently, we wrote a joint paper Sinha and Rao (2009, The Canadian Journal of Statistics) in which we proposed and explored a robust method for small area estimation. The robust method, developed in the framework of the maximum likelihood estimation in linear mixed effects models, is useful in downweighting the potential outliers in the data when estimating the model parameters. In the current project, we allow for the possibility that the mean response function is nonlinear, and we consider using a spline regression equation to model the nonlinear structure. We also allow for the possibility that there are outliers in the random effects as well as random errors, and we investigate a robust approach for analyzing such data in the presence of outliers.

Optimal designs for GLMMs

In this collaborative project with Xiaojian Xu (Department of Mathematics, Brock University), we study sequential optimal design methodologies for efficient estimation in generalized linear mixed models (GLMMs). GLMMs are commonly used in the analysis of clustered correlated discrete binary and count data including longitudinal data or repeated measurements. We study the properties of the ML estimators obtained under the sequential design schemes.