Graduate Student Seminar
2010 Term 1
Thursdays 12-1pm
LSK 301
16 Sept, 2010
Kevin Ushey
Introduction to Reading Data and Exploratory Data Analysis in R
23 Sept, 2010
Introduction to unsupervised learning, Clustering
30 Sept, 2010
07 Oct, 2010
Multivariate analysis of variance with fewer observations than the dimension
One of challenges behind multivariate data analysis is the curse of dimensionality. In particular, we may only have a small sample of individuals or objects but we are interested in investigating enormous measurements on the individuals or objects simultaneously. For example, in microarray studies, the number of genes to be observed could be thousands while there are only few individuals in the studies available for collecting gene information. This type of data set is often referred as multivariate data with fewer observations than the dimension or high dimensional data. Starting from 1950s, there are surging efforts to cope with difficulties arising from high dimensional data. Many literature focuses on various tests of hypotheses on mean vectors and covariance matrices such as a high dimensional two sample significance test proposed by Dempster (1958, 1960), tests criteria for the covariance matrix (Srivastava, 2005), and asymptotic results of a high dimensional MANOVA test (Fujikoshi et al., 2004). In this week's student seminar, I would like to provide a review on methodologies proposed by Srivastava and Fujikoshi in 2006. [Slides]
14 Oct, 2010
Variable selection via penalized likelihood methods (PLM)
Variable selection plays an important role in the analysis of survey data, where one common issue of interest is to identify the influential factors that are associated with certain behavior, social or economical indices. Traditional selection procedures, such as the best subset selection or stepwise regression, can be computationally expensive or unstable in the selection process. Instead, the penalized likelihood methods (PLM) are now being used as computationally feasible alternatives for variable selection. In this talk, we study the use of PLM in the analysis of complex survey data and discuss the corresponding asymptotic properties.
21 Oct, 2010
Getting started with Sweave
Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change. In my talk, I will give a tutorial on sweave. [Slides and link of xtable gallery from CRAN]
28 Oct, 2010
A few things about Linux
04 Nov, 2010
11 Nov, 2010
18 Nov, 2010
Better research software through lightweight software engineering practices
In this workshop, I will introduce a handful of low-pain/high-payoff software engineering practices that will make developing software for your research easier and less error-prone. I plan to cover the following topics: version control, automation, and documentation
25 Nov, 2010
Model based inference of high-throughput sequencing data
The famous microarray technology created huge number of opportunities and challenges to statistician in the past. Now it is replaced by a new technology "high-throughput sequencing", which created new challenges and opportunities. I will discuss statistical methods for whole-genome profiling of transcription factor binding regions and positions of nucleosomes using high-throughput sequencing data. I will try my best to avoid formula to make my talk easier to understand.
-
2009 TERM 2
Tuesdays 12:15-1:45pm
LSK 301
09-Mar-2010 12:30-1:30pm
Eugene BarskyMastering Google for Science and Engineering
Abstract:
This 1.5 hrs workshop, presented by the UBC Science and Engineering librarians will: Review major commands you can use with Google search engine to make your searches more precise and relevant; Review Google Scholar and its potential and use; Compare Google/Google Scholar with Compendex (major engineering database)
09-Mar-2010 12:30-1:30pm
Statistical classification is a procedure in which individuals are placed into classes/groups based on information on one or more characteristics inherent in the items and on a training dataset of previously labelled items. A training dataset is used to fit a classifier which is used to predict a 'response value' from one or more features on the test dataset. This talk will be mainly on the supervised learning technique where the classes/groups are determined in advance. Many classification methods encounter limitations such as: low accuracy if too many input variables (NN, LDA), poor ability to deal with irrelevant inputs (SVM, KNN), poor ability to deal with multiple mechanisms with a single class (any single classifier). However, ensembles of classifiers are able to handle those limitations and can improve classification accuracy. Several algorithms to construct ensembles will be briefly discussed, like, bagging, boosting, random forest, systematic subset etc. They will be applied to a highly unbalanced dataset. Finally, a novel method to evaluate classifiers in a highly unbalanced classification situation will be introduced.
- 02-Mar-2010 12:30-1:30pm
-
Introduction to spatial analysis
In many cases, data may not only contain information about the response being studied but also contain variables that indicate the geographic location where the responses are observed. A data set collected from an agricultural field, for example, have observations on the yield of a particular crop and a variable that records the location where the yield is observed in the agricultural field. We may be interested in comparing the average crop yield in different areas of the field. The geographic location of the response is important since it carries relevant information to the analysis of data. This type of data is often called spatial data. There is one key feature of spacial data: autocorrelation between observations tends to be non-zero. Observations that are more spatially close to each other tend to share more similarities compared with the observations that are more far away. Thus we are looking for statistical methodologies that are able to deal with specific challenges raising from spatial data. Recent decades, there is an increasing demand for statistical analysis of spacial data especially in agricultural and environmental areas. In this talk, a background and introduction with examples will be given to illustrate approaches in spatial data analysis.
- 9-Feb-2010 12:30-1:30pm
- Yan ( Lucy ) Cheng
-
Introduction to Survival Analysis in R
Survival analysis is the name for a collection of statistical techniques used to describe and quantify time to event data. It is applied in a number of applied fields, such as medicine, public health, social science, and engineering. This is an introductory talk that gives a general overview of most commonly used R functions in survival analysis. Many theoretical details have been intentionally omitted for brevity, and it is intended to assist an individual who has familiarity with R and the theory of the topics presented and wants to brush up the relative R applications. Topics cover Survival objects, KM estimate, Confidence bands, Cumulative hazard function, Tests for two or more samples, Cox Proportional Hazard models with either constant or time-dependent covariates, as well as Accelerated Failure Time models. [Slide]
- 2-Feb-2010 12:30-1:30pm
- Pavel Krupskii
-
Families of distributions which are distinguished by having both a parameter that itself a distribution function (so called underlying distribution) and a real-valued parameter are said to be semiparametric. I will introduce some semiparametric families and describe underlying distributions that lead to the coincidence of two different semiparametric families. - 26-Jan-2010 12:30-1:30pm
- Lei (Larry) Hua
-
A Brief Introduction to Extreme Value Theory
In this talk, I will introduce some basic concepts and important results from the extreme value theory (EVT). The univariate EVT has been well developed. It has many applications in finance, environmetric, econometric, internet data traffic and so on. If we have more time, I will proceed to talk about multivariate EVT, which is relevant to studying the dependence properties between extremal events. -
2009 TERM 1
Tuesdays 12:15-1:45pm
LSK 301 - 24-Nov-09 12:30-1:30pm
- Libo Lu
-
Introduction to Numerical Optimization
This talk focuses on the algorithms for unconstrained and constrained optimization, including Newton-Raphson method, Quasi-Newton method, Conjugate Gradient method, Trust Region method, Penalty method, Sequential Quadratic Programming, Interior Point method, etc. The optimality conditions are shortly reviewed. Some optimization functions available in R are also listed at the end. - 17-Nov-09 12:30-1:30pm
- Xu (Steven) Wang
-
Joint inference for longitudinal and survival data with application to the AIDS studies
In many longitudinal studies, the individual characteristics associated with the repeated measures may be covariates for the time to an event of interest. Thus, it is desirable to model the time-to-event process and the longitudinal process jointly. Statistical analysis may be complicated with missing data or
measurement errors in the time-dependent covariates. This article considers a nonlinear mixed-effects model for the longitudinal process and the Cox proportional hazards model for the time-to-event process. We provide a method for simultaneous likelihood inference for nonignorable missing data and extend the method to time-dependent covariates. We adapt a Monte Carlo EM algorithm to estimate the model parameters. We compare the method with a naive two-step method and a bootstrap method with some interesting findings. An AIDS dataset is used as an illustration. [slice] - 10-Nov-09 12:30-1:30pm
- Jing Dong
-
It is well-known that if a random vector with given marginal distributions is comonotonic, it has the largest sum with respect to convex order. However, comonotonicity sometimes cannot reflect the reality well. For instance, in an insurance context, we may have partial information about the dependence structure of different risks in the lower tail. In this paper, we extend the aforementioned result, using the concept of upper comonotonicity, to the case where the dependence structure of a random vector in the lower tail is already known. Since upper comonotonic random vectors have comonotonic behavior in the upper tail, we are also able to extend several well-known results of comonotonicity to upper comonotonicity. As an application, we construct different increasing convex upper bounds for sums of random variables and compare these bounds in terms of increasing convex order. - 03-Nov-09 12:30-1:30pm
- Song Cai
-
A stochastic processes based regression model for phenological data with time dependent covariates
We aim to study the dependence of bloom dates of six high-valued crops on local daily average temperature in the Okanagan Valley, as well as to forecast the future bloom dates of these crops. Classic models such as proportional hazards model and accelerated failure time model have difficulties for modeling and especially forecasting bloom dates data, since the daily average temperature is a special case of time-dependent covariates. We developed a stochasic processes based regression model for this data to handle the time-dependent covariate problem in both parameter estimation and forecasting. This model can be easily extended to incorporate sequential multiple responses. It will also be useful for a wide range of survival data in medical research. - 20-OCt-09 12:30-1:30pm
- Camila P. Estevam de Souza
-
Multivariate kernel regression
Kernel regression is a very known nonparametric technique to estimate the relationship between a response variable and a covariate. In the case of many covariates the method can be extended to the so called multivariate kernel regression (MKR). In my presentation I will talk about MKR when we have both continuous and discrete covariates. The R package np will be introduced as well. - 13-OCt-09 12:30-1:30pm
- Eric Fu
-
Logistic regression is the most popular regression model for binary data. However, only for rare events is the odds ratio a good approximation to the relative risk, which is often the actual target of inference in prospective and cross-sectional studies in epidemiology. Various approaches to estimate the relative risk have been proposed in the literature. This talk will focus on two regression models: 1) log-binomial regression and 2) Poisson regression with sandwich estimator for variance. Specifically, we will discuss the computational issues of log-binomial regression and the efficiency of Poisson regression.
-
06-OCt-09 12:30-1:30pm
-
Yi Huang
-
A Brief Introduction to Empirical Likelihood
Empirical likelihood is a nonparametric method of statistical inference. It shares several celebrating properties with ordinary parametric likelihood but does not require us to specify a family of distributions for the data. The idea of empirical likelihood has very general application. However, empirical likelihood has its own limitations. In this short introduction, the advantages, disadvantages, and possible adjustment and modification of empirical likelihood will be described and illustrated, and the result of some small simulation will be presented. -
29-sep-09 12:30-1:30pm
-
Kaida Ning
-
ChIP-Seq is a technology for detecting in vivo transcription factor binding sites or histone modication sites on a genome wide scale. How to utilize the large scale data and nd out biological insights is a challenging question for us. Here, we analyzed three ChIP-Seq data sets for human HeLa cell, including data of a transcription factor, signal transducer and activator of transcription 1 (STAT1), RNA polymerase II (Pol2), and histone monomethylation (Me1). With these data sets, we looked into the spacial relationship between STAT1, Pol2, Me1 and the gene transcription start site; we checked the intersection of STAT1, Pol2 and Me1; we did de novo motif discovery for the sequences around the STAT1 binding sites, and predicted several motifs which may form cis-regulatory module with STAT1 GAS motif; we also analyzed the ChIPSeq data along with gene expression data, and found that STAT1 binding is related with differential expression of genes. We suggest that further ChIP-Seq experiment be carried out for TFs corresponding to the de novo predicted motifs, and that gene expression be characterized for the IFN-gamma stimulated HeLa cell.
- 22-sep-09 12:15-1:30pm
-
Luke Bornn
Monte Carlo Approximation Methods
- Standard statistical models are often inadequate to describe complex real-world phenomena such as signal processing, ballistics, and population genetics. However, more accurate models often preclude the computation of explicit answers, thus approximation methods must be developed. As the dimension of the problem grows, deterministic algorithms become computationally infeasible. In contrast, Monte Carlo methods are stochastic algorithms that rely on random processes to simulate the behaviour of the model of interest. I will begin the talk by explaining simple Monte Carlo methods such as the accept-reject algorithm, then proceed to work towards Markov chain Monte Carlo (MCMC), highlighting relevant examples along the way.
-
2008 TERM 2
Tuesdays 12:45-1:45pm
LSK 301 - 27-JAN-09
-
Kenneth Chi Ho Lo
Empirical Bayes models for differential gene expression (slides)
-
Inference about differential expression is a typical objective when analyzing gene expression data. The application of Bayesian hierarchical models is a popular approach for this type of problem. In this talk, we will come across the two commonest modeling choices, the Gamma-Gamma and Lognormal-Normal models. We will also propose an extension of these models in order to release some unrealistic assumptions, while parameter estimation can still be accomplished conveniently via EM-based algorithms.
- 03-FEB-09
-
Tianji Shi
Tax incidence when Quality matters: Study of the Beer industry (slides)
Standard economic theory simply suggests that given an industry-wide tax increase, the tax would be passed onto the consumer, resulting in higher price and lower demand. Also, higher quality or high priced products would be made comparatively cheaper and would take market share away from cheaper products. We used market demand estimations and regressions to study if the standard theory holds true for a more complex market such as the U.S. beer industry. - 10-FEB-09
-
Aline Tabet
Latent Variable Modeling
Introduction to latent variable modeling. We will explore the analysis of such model using path analysis and structural equations modeling. We will discuss different issues such as identifiability and interpretation of such models. - 17-FEB-09
-
Michael Regier
Strategies for Developing a Health Research Career
Once you have completed your degree, there are a number of ways you can choose to develop your career. Presenter will briefly and informally outline several paths within health research. Additionally, presenter will present some strategies for making the best "next step" decisions within health research in order to reach your long term goals. Here are the related links you might find useful:
Common CV: http://www.commoncv.net/
Research NET: https://www.researchnet- recherchenet.ca
Michael Smith Foundation for Health Research (Funding): http://www.msfhr.org/sub- funding.htm
CIHR: http://www.cihr-irsc.gc.ca - 24-FEB-09
-
Lei (Larry) Hua
A brief introduction to Copula (slides) - The word copula is a Latin noun and means "a link, tie or bond". In statistics, roughly speaking, copula plays a role like a rope that ties marginal distributions together. So we usually believe that a copula characterizes some properties of the dependence structure between marginal random variables. It provide ones a very flexible platform to construct infinitely many multivariate distributions, especially be useful in non-Gaussian fields, such as financial and actuarial modeling, where the traditional Gaussian distributions are not suitable. Copula is not a new concept, but its usage in dependence modeling has been only widely realized since 1990's. In this talk, presenter is going to present some basic concepts of copula and its applications for dependence modeling.
- 03-MAR-09
-
Chi Wai Yu
-
Introduction to Statistical Decision Theory (slides)
Unlike classical statistics which is only directed towards the use of sampling information in making inferences about unknown numerical quantities, an attempt in decision theory is made to combine the sampling information with a knowledge of the consequences of the decisions. This knowledge is often quantified by a so-called loss function that is incurred for each possible decision and for different possible values of the unknown quantities. In this presentation, presenter will briefly introduce some decision-theoretic techniques for makng inference, especically for parameter estimation. The optimality of estimators under several different criteria will also be discussed. - 10-MAR-09
- Chaoxiong (Michelle) Xia
-
The Basics of SAS programming and data management
In this talk, we will introduce the basic knowledge for SAS programming and data management. In particular, SAS procedures and steps such as table joining, appending, use of macros, sql and matrix language will be presented.
- 24-MAR-09
-
Camila P. Estevam de Souza
-
A kth Nearest Neighbor Clustering Procedure
Due to the lack of development in the probabilistic and statistical aspects of clustering methodology, clustering procedures are often regarded as heuristics generating artificial clusters from a given data set. Clearly, there is a need of clustering procedures that are useful for drawing statistical inferences about the underlying population from a sample. Because of this need, Wong and Lane (1983) propose a clustering procedure that is based on the uniformly consistent kth nearest neighbor density estimate. It can be shown that the proposed clustering procedure is asymptotically consistent for high-density clusters in several dimensions. This seminar will be about something related to the paper: "A kth Nearest Neighbor Clustering Procedure" (Wong and Lane, 1983). See http://www.jstor.org/stable/ 2345405 for details.
-
2008 TERM 1
Tuesdays 12:45-1:45pm
LSK 301 - 09-Sep-08
-
Mike Danilov
Statistical Consulting (slides) -
The main purpose of the talk is to introduce the new graduate students to the consulting activities in the department.
http://www.stat.ubc.ca/SCARL/STCS/ - 16-Sep-08
-
Liangliang Wang
Academic/Technical Writing
In this student seminar, Liangliang will talk about the technical writing mainly based on the MITACS' Skill Enhancement TechnicalWriting Workshop (http://www.mitacs.ca/conferences/TWW/). Topics include audience/purpose analysis, the writing process, understanding how readers read, the components of report, editing and proofreading etc. - 23-Sep-08
-
Ehsan Karim
R Workshop (slides and data files) -
- Introductions (R/alternatives)
- Objects (vector/matrix/dataframe)
- Importing Data (CSV/Tab Delimited/Fixed Formats)
- Statistical Functions (random number/summary/regression/ anova)
- Plotting (basic/add/other)
- Functions (syntax, arguments)
- Logical Constructs & Looping (step/return/if/for/bootstrap)
- Reference (help in r/web/book)
- 30-Sep-08
-
Luke Bornn
Filtering
Imagine trying to track a military target over time using nothing but noisy radar measurements. For a fixed time length, we can use batch inference to estimate the target's position. However, when data appears online, we must use recursive formulas to estimate the target's position and its evolution over time. In this introductory talk I'll discuss how this problem may be handled analytically when the movement is linear and the noise is Gaussian using the traditional Kalman filter. I'll then move towards techniques for estimating the system under more complicated scenarios using the extended Kalman filter and/or sequential Monte Carlo.
- 07-OCT-08
-
Aline Tabet
Latex Workshop (slides)
- 1- Introduction to Latex :
- - What it is,
- - where to download it and
- - how to use it
- 2- How to create a simple document using latex:
- - Scientific notation
- - Including Tables
- - Including Figures
-
3- How to create presentations using beamer.
- 14-OCT-08
-
Derrick Lee
Unix Workshop
http://www.stat.ubc.ca/Computing/FAQ/UNIXhelp/ - 21-OCT-08
-
Reza Hosseini
An introduction to stochastic processes - 1. Gaussian fields,
- 2. Gaussian inequalities,
- 3. Orthogonal expansions,
- 4. Excursion probabilities,
- 5. Stationary fields,
- 6. Stochastic calculus and
-
7. Ito's formula.
- 4-NoV-08
-
Chen Xu
A Maximum likelihood Methodology for Clusterwise Linear Regression
Chen will briefly introduce a conditional mixture, maximum likelihood methodology for performing clusterwise linear regression. This methodology can simultaneously estimate the separate regression functions and membership in K clusters or goups. In addition, it shows strong connection with the well-known EM algorithm. - 18-NoV-08
-
Derrick Lee
SLATE: UBC Statistics Teaching Website
Topic is a bit technical, applicable to those that will be TA's/head TA's/Instructor and want to edit the SLATE system for academic purposes.
https://slate.stat.ubc.ca/slate/Slate/Help
2007 TERM 2
Tuesdays 12-1pm
LSK 301
- 22-Jan-08
-
Wei Wang
An introduction to functional principal components analysis. - 29-Jan-08
- Dept. Review Meeting (6 or so graduate students needed!)
- 5-Feb-08
-
Michael Regier
An Introduction to Missing Data - 12-Feb-08
- Mohua Podder
- Genotype classification models for APEX microarray data
- 19-Feb-08
-
Bela Nagy
A Multidisciplinary Introduction to the Design and Analysis of Computer Experiments - 26-Feb-08
-
Alexandra Romann
Introduction to Measurement Error (slides)
- 4-Mar-08
-
Kenneth Chi Ho Lo
Generalized Estimating Equations and Credibility (slides) - 11-Mar-08
-
Mike Danilov
Robust Statistics (slides) - 18-Mar-08
-
Doug Nychka
Student meetings - 25-Mar-08
-
Student Seminar Series: Talk Back
Short feedback presentation assessing the inaugural Student Seminar Series followed by open constructive feedback
