September 3, 2024:
Some papers are listed below, ordered by year of publication; others may be added to the list during the year. Subject to availability of time and scheduling constraints, I am happy to supervise any of these papers.
If one of the listed papers interests you, contact me to arrange a meeting to discuss what you would like to do in connection with that paper. In general terms, I will require a written report as well as an oral presentation of your work on the paper. However, exactly what that work might be is open to negotiation. You should have specific ideas about what you want to do for any paper that interests you.
I am also willing to consider supervising other papers. If you have a paper you would like to study for STAT 548 and you think it might interest me, feel free to inquire if I am willing to supervise your study of it. As with the listed papers, you should have specific ideas about what you would like to do in connection with that paper.
This discussion paper develops an approach to assessing the sensitivity of inferences to various types of selection bias; for example, in the way that subjects may have been allocated to treatments. The approach is to extend simple statistical models by including an additional parameter that models the degree of non-randomness in the mechanism generating the data, and to study inference conditional on a range of different values of that parameter. Several problems are investigated, including non-ignorable nonresponse. The paper requires no advanced knowledge so is suitable for any student. Note: Almost half of the paper consists of comments by multiple discussants. You will not be required to review these although they may provide useful perspectives on the content of the paper.
This paper considers parametric regression models for an outcome Y on a vector of covariates X = (W,Z), when W is always observed but Z is unobserved for some of the units. If Z can be assumed to be missing at random (MAR), knowledge of the conditional distribution of Z given W would allow implementation of maximum likelihood. The paper presents a semiparametric maximum likelihood approach where that conditional distribution is left unspecified and proposes an EM algorithm to carry out corresponding inferences. The paper considers one of the simplest problems involving missing data so can serve as an entry point to the literature on missing data problems for any student with an interest in this general area. Previous exposure to the basic notions of missing data mechanisms would be useful, though not essential.
This paper investigates the efficiency of response-adaptive two-stage designs, where after the first stage the data are used to determine a locally optimum design for the second stage, relative to a fixed design without adaptation. The comparisons are based on an explicit expression for the (asymptotic) Fisher information of such designs. This paper is suitable for a student with an interest in designs for clinical experiments, particularly a student wanting to gain some familiarity with the current rage for adaptive designs.
This paper presents a framework for valid statistical inference when an experimental dataset (of features paired with outcomes) is supplemented with predictions from a machine-learning system for a second feature-only dataset. The paper presents a three-step protocol that leads to algorithms for obtaining valid confidence intervals for estimands such as means, quantiles, and linear and logistic regression coefficients that requires no assumptions about the machine-learning algorithm that supplies the predictions. Multiple examples illustrating the properties of the approach are provided. Derivations and additional results appear in the Supplement. As the development is from basic principles, specialized background is not required for comprehension of this paper and it is suitable for any student.