Research Funding:
NSERC Discovery Grant (April 2022 - March 2027)
Project Title:
Robust and efficient methods for analyzing complex longitudinal and survival data
NSERC Discovery Grant (April 2016 - March 2022) [completed]
Project Title:
Statistical methods for complex clinical and survey data
NSERC Discovery Grant (April 2011 - March 2016) [completed]
Project Title:
Statistical methods for clinical research
Research Projects:
Missing data analysis
Missing data are common in many clinical experiments. For example, in a longitudinal study, an individual's outcomes
can be missing at one follow-up time, and be measured at the next follow-up time, resulting in a class of nonmonotone
missingness patterns. If the probability that the outcomes are missing at a given time depends on the values of the
(possibly unobserved) outcomes at that time, then the missing-data mechanism is nonignorable.
Currently, I am involved in a project on AIDS data analysis with Stuart Lipsitz (Harvard
Medical School) and Garrett Fitzmaurice (Harvard Medical School) in which
the goal is to develop a suitable method for analyzing longitudinal data on CD4 counts from HIV-infected patients. A
full likelihood approach is complicated algebraically, and requires extensive computation since there are many
follow-up times. We explore a pseuo-likelihood approach based on a bivariate binary regression model for analyzing the
data.
I visited Nan Laird (Harvard School of Public Health) for two months in the Summer
of 2009, and worked on a project jointly with Nan Laird and Garrett Fitzmaurice. The project involves the analysis of multiple binary outcomes using a multivariate
logistic regression model with incomplete covariate data where auxiliary information is available. The auxiliary data
are extraneous to the regression model of interest but predictive of the covariate with missing data.
Proteomics data analysis
Recently, I visited Wing Wong (Department of Statistics, Stanford University) during
August-November 2009, and became involved in a project on proteomics data analysis. In this project, we consider
analyzing longitudinal data on peptide (a component of a protein) intensities obtained from a group of patients. Data
were also collected from a control group of healthy individuals. The goal is to determine proteins that are
differentially expressed over time, and also to identify proteins that are differentially expressed in the patient and
control groups. The longitudinal data obtained from the patient group contained many nonmonotone missing responses.
These missing data are assumed to be nonignorbale, since the missingness may be due to the low abundance of the peptide
intensity. We consider analyzing the data using a linear mixed model with a block-diagonal covariance structure. Since
the large dataset involves thousands of peptides, and the full likelihood approach is computationally intensive, we
consider using an approximation to the likelihood approach for analyzing the data. We use q-values (rather than
standard p-values) for multiple tests of hypotheses to identify differentially expressed proteins. The proposed method
may also be useful in the microarray analysis where longitudinal data on gene expressions can be obtained from a group
of individuals. Here data may be missing due to low gene expression levels, and one needs to incorporate a
nonignorable missing-data mechanism for analyzing such data.
Robust small area estimation
This is a continuation of the project on small area estimation with Jon Rao (Carleton
University). Recently, we wrote a joint paper Sinha and Rao (2009, The Canadian
Journal of Statistics) in which we proposed and explored a robust method for small area estimation. The robust
method, developed in the framework of the maximum likelihood estimation in linear mixed effects models, is useful in
downweighting the potential outliers in the data when estimating the model parameters. In the current project, we
allow for the possibility that the mean response function is nonlinear, and we consider using a spline regression
equation to model the nonlinear structure. We also allow for the possibility that there are outliers in the random
effects as well as random errors, and we investigate a robust approach for analyzing such data in the presence of
outliers.
Optimal designs for GLMMs
In this collaborative project with Xiaojian Xu (Department of Mathematics, Brock
University), we study sequential optimal design methodologies for efficient estimation in generalized linear
mixed models (GLMMs). GLMMs are commonly used in the analysis of clustered correlated discrete binary and count data
including longitudinal data or repeated measurements. We study the properties of the ML estimators obtained under the
sequential design schemes.