Winnie Xu.

Hey, I'm Winnie

Let's reflect on the latest scientific research, challenge our assumptions when designing implementation, and use technology to empower expansive reformation.

My research lies at the intersection of generative models, information retrieval, and natural language understanding. My goal is to build interpretable and scalable generative systems through the incorporation of useful inductive biases, compression, and powerful learning / inference schemes. Towards this, I explore ways to inject classical algorithms with differentiable components, build efficient methods exploiting intermediate representations, and improve reasoning capabilities of large language models via modularity.

I'm fortunate to have learned from many influential mentors at the early stages of my career. Most recently, I was a Student Researcher at Facebook AI Research (Meta AI), and collaborated closely with Stefano Ermon and friends at Stanford StasML. In my undergrad, I collaborated on various projects at Google Brain with Igor Mordatch (Brain Robotics) / David Dohan (Generative Models) / Durk Kingma.

I studied Artifical Intelligence Computer Science, Statistics and Math at the University of Toronto. My AI research journey began by scaling methods in latent variable modeling and probabilistic inference with David Duvenaud (Vector Institute). I hope to one day bring similar influence with my ideas at the cutting edge.

/ / / / /

My recent work focuses on improving the robustness of neural differential equation paradigms, developing methods in data efficient mini-batching for training LLMs, and designing deep generative models that leverage compressible representations of periodic features.

As a biologist-turned deep learning scientist, I hope to democratize the fruits of machine learning to natural sciences and engineering. Whether in computational genetics or ML, much of my work is about inferring properties of the physical world, exploring the anomalies and uncertainties of observed phenomena, and reasoning about the underpinnings of causality.

KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky Douwe Kiela

Alignment methods like RLHF and DPO expect feedback in the form of preferences (e.g., Output A is better than B for input X). Utilizing human annotation efforts for this feedback quickly gets very expensive, and can also result in conflicting data. Kahneman-Tversky Optimization (KTO) matches or exceeds (state-of-the-art) DPO performance without using preference data. KTO is far easier to use in the real world, where preferences are scarce and expensive to collect.

Preprint, 2024.
paper | leaderboard | blog | code | model checkpoints

Revisiting Associative Compression: I Can’t Believe It’s Not Better
Winnie Xu, Matthew J Muckley, Yann Dubois, Karen Ullrich

An interesting problem in representation learning is training on an optimal index that is helpful towards the learning task. We explore the impacts of sequentially compressing an unordered set of assets as opposed to random priority assignment. Unfortunately, general set compression methods show minimal gains in terms of rate-distortion tradeoff when used with our proposed associative compression architectures.

ICLR Neural Compression Workshop, 2023.
paper

Multi-Game Decision Transformers
Kuang-Huei Lee*, Ofir Nachum*, Mengjiao Yang, Lisa Lee, Winnie Xu, Daniel Freeman, Sergio Guadarrama, Eric Jang, Henryk Michalewski, Ian Fischer, Igor Mordatch

A longstanding goal in the field of AI is to find a strategy for compiling diverse experience into a highly capable, generalist agent. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we train one generalist control agent on a diverse set of offline data to play 46 Atari games. We demonstrate scaling of performance in the model size and rapid adaptation to novel games with fine-tuning. Compared to existing methods in behavioural cloning, online, and offline RL, MGDT offers the best scalability and performance.

Neural Information Processing Systems (NeurIPS), 2022 [Oral].
web | paper | blog | code

Self–Similarity Priors: Neural Collages as Differentiable Fractal Representations
Winnie Xu*, Michael Poli*, Stefano Masseroli*, Chenlin Meng, Kuno Kim, Stefano Ermon

We develop a new class of differentiable operators that leverage the underlying regularity of natural and artificial objects at various scales. Neural Collages are a class of implicit operators that discover self-similar representations of data and are efficient neural compressors, powerful decoders in deep generative models, and may be used towards various creative and artistic generations.

Neural Information Processing Systems (NeurIPS), 2022.
paper | slides | code | blog | demo

Language Model Cascades
David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-dickstein, Kevin Murphy Charles Sutton

Modern language models do in-context learning and show amazing abilities to zero-shot generalize to unseen conditions. We represent compositions of prompted language models as probabilistic programs termed Model Cascades. With this general framework for sampling and inference, we formalize prompting, reasoning, and tool use as graphical models over random variables with complex string values.

Bayond Bayes: Paths Towards Universal Reasoning Systems @ ICML, 2022 [Spotlight].
web | paper | code

Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations
Winnie Xu, Ricky T.Q. Chen, Xuechen Li, David Duvenaud

We explore the concept of infinite-dimensional stochastic variational inference in learning the dynamics of continuous-time Neural ODEs with instantaneous noise. Our framework trains Bayesian neural networks by parameterizing weight uncertainty with stochastic differential equations (SDEs) and associate efficient gradient-based algorithms. We also conjecture and derive zero variance gradient estimators to the infinite-dimensional case as the approximate posterior converges to the true posterior.

International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
paper | poster | slides | code | talk

Noisy Feature Mixup
Soon Hoe Lim, N. Benjamin Erichson, Francisco Utrera, Winnie Xu, Michael Mahoney

We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data augmentation that combines the best of interpolation based training and noise injection schemes. We use noise-perturbed convex combinations of datapoint pairs in both input and feature space to learn smoother decision boundaries, leading to improved robustness. The advantageous implicit regularization effects of NFM compared to previous mixup training methods are further understood with our theoretical insights. We show that residual networks and vision transfromers trained with NFM have favorable trade-offs between predictive accuracy and out-of-distribution generalization.

International Conference on Learning Representations (ICLR), 2022.
paper | slides | talk | blog (coming soon)

Prioritized training on points that are learnable, worth learning, and not yet learned
Soren Mindermann*, Muhammed Razzak*, Winnie Xu*, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal

We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right". We propose an information-theoretic acquisition function -- the reducible held-out loss -- and compute it with a small proxy model -- GoldiProx -- to efficiently choose training points that maximize information about a validation set. We show that the selected sequence not only prioritizes learnable, yet information rich data relevant to the evaluation task but also effectively transfers across architectures and vision tasks.

International Conference on Machine Learning (ICML), 2022 [Spotlight].
paper | poster

Exploring the Differential Effects of Sequencing Resolution on Semi-Automated Genome Annotations
Winnie Xu, Francis Nguyen, Michael Hoffman

We validated the utility of two novel next-generation-sequencing assays (ChIP-Exo and ChIP-Nexus) in epigenetic exploration of cancer transcriptomes. I implemented custom bioinformatics pipelines, unsupervised ML algorithms (Segway), and the stable marriage algorithm.

Princess Margaret Cancer Research Center 2018.
poster | code | slides

This is how I've learned while having fun. I strive to maintain the same levels of inquisitiveness as I mature in my craft.

From data mining to open source web development, entrepreneurship to basic science, I am grateful to have spent my time gaining and practising new skills and making a positive impact on myself and others.

Innovape: The Health Aware Vape

With recent headlines showing multiple fatalities from vaping, we sought to create a solution that can mitigate nicotine addiction in Juul users. We reverse engineered the Juul to function with our Gaussian Process prediction model that can analyze the breathing patterns and usage frequency of users, then dynamically adjust the nicotine output percentage accordingly. What differentiates our product is the implementation of gradual weaning, where the percentage decrease is customized from user feedback.

Currently in conversation with the Centre for Addiction and Mental Health to use this device with patients.

Winner Hack the North 2019

Initialized Equilibrium Propagation for Backprop-free Training
Winnie Xu, Matthieu Chan Chee, Jad Ghalayini, Jacob Kelly

As part of the 2019 ICLR Reproducibility Challenge, we implemented this paper (later accepted) that investigated a novel improvement to Equilibrium Propagation, which is a method of energy based models (EBMs).

code

DOC: Digital On-Call-Healthcare Consultant

DOC is a tool for doctors to receive a second opinion on the diagnosis of medical conditions directly through patient interaction in real time through the analysis of the symptoms and conversations they have with their patients. Subsequently, DOC streamlines the recommendation for potential ailments by providing further questions to ask in order to generate an objective and accurate diagnosis that is wholistic in scale.

1st Place BCGxGoogle Global Engineering Week Hackathon 2019

SocialBit

The SocialBit measures an individual’s social interactions throughout the day. A glasses-mounted camera detects faces and references them with an existing database of the user’s friends on social media platforms. The data is then visualized in a network plot and chord diagram that showcases the length and location of one’s interaction, defined as the points in time in which the friend’s face was within view. The result is a beautiful and insightful display of our daily social interactions - at the microscale.

HackMIT 2018