It is difficult to anticipate the myriad challenges that a predictive model will encounter once deployed. Common practice entails a reactive, cyclical approach: model deployment, data mining, and retraining. We instead develop a proactive longtail discovery process by imagining additional data during training. In particular, we develop general model-based longtail signals, including a differentiable, single forward pass formulation of epistemic uncertainty that does not impact model parameters or predictive performance but can flag rare or hard inputs. We leverage these signals as guidance to generate additional training data from a latent diffusion model in a process we call Longtail Guidance (LTG). Crucially, we can perform LTG without retraining the diffusion model or the predictive model, and we do not need to expose the predictive model to intermediate diffusion states. Data generated by LTG exhibit semantically meaningful variation, yield significant generalization improvements on numerous image classification benchmarks, and can be analyzed by a VLM to proactively discover, textually explain, and address conceptual gaps in a deployed predictive model.
Bio
David Hayden leads Perception AI Research at Cruise, where he focuses on generative and world models, foundation model alignment and guidance, longtail robustness, uncertainty quantification, and synthetic data. He has consulted on machine learning and computer vision for diverse industries including pharmaceuticals, retail, and competitive sports. His work has shipped to hundreds of driverless cars, ran live in stadiums of 40,000 people, supported seed and Series A rounds, and is published in top conferences and journals including ICML, CVPR, Neurips, and Nature. He previously founded Essistive Technologies, where he developed and licensed discreet note-taking tech for individuals with limited vision. David received a PhD at MIT working on interpretable machine learning and computer vision, with emphasis on behavior analysis, multi-object tracking, Bayesian nonparametrics for time-series, distributions on manifolds, and uncertainty to guide decision making.
To join this seminar virtually, please request Zoom connection details from ea@stat.ubc.ca.
Speaker's page: http://www.dshayden.com/
Location: ESB 4192 / Zoom
Event date: -
Speaker: David S. Hayden, Research Scientist, GM Cruise