Let's reflect on the latest scientific research, design pluralistic implementations, and use technology to empower expansive reformation.
My research lies at the intersection of alignment science, generative models, and
natural language understanding. My goal is to build scalable AI systems that are shaped by real-world actors and feedback.
Towards this, I explore ways to inject classical algorithms with differentiable components,
leverage (increasingly) strong LLMs in synthetic data pipelines, and improve the factual reasoning capabilities of weaker models.
I studied Artifical Intelligence Computer Science, Statistics and Math
at the University of Toronto.
My research journey began by scaling methods in latent variable modeling and probabilistic inference with David Duvenaud (Vector Institute).
I hope to one day bring similar influence with my ideas at the cutting edge.
My recent work focuses on breaking the monolithic paradigm of model alignment
through preference learning, advancing data efficient generalization in LLMs, and
designing deep generative models that leverage compressible representations of periodic features.
As a biologist-turned deep learning scientist, I hope to democratize the fruits of machine learning to natural sciences and engineering.
Whether in computational genetics or ML, much of my work is about inferring properties of the physical world, exploring the
anomalies and uncertainties of observed phenomena, and reasoning about the underpinnings of causality.
Alignment methods like RLHF and DPO expect feedback in the form of preferences (e.g., Output A is better than B for input X).
Utilizing human annotation efforts for this feedback quickly gets very expensive, and can also result in conflicting data.
Kahneman-Tversky Optimization (KTO) matches or exceeds (state-of-the-art) DPO performance without using preference data.
KTO is far easier to use in the real world, where preferences are scarce and expensive to collect.
An interesting problem in representation learning is training on an optimal index that is helpful towards the learning task.
We explore the impacts of sequentially compressing an unordered set of assets as opposed to random priority assignment.
Unfortunately, general set compression methods show minimal gains in terms of rate-distortion tradeoff when used with our
proposed associative compression architectures.
A longstanding goal in the field of AI is to find a strategy for compiling diverse experience into a highly capable, generalist agent.
In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets.
Motivated by this progress, we train one generalist control agent on a diverse set of offline data to play 46 Atari games.
We demonstrate scaling of performance in the model size and rapid adaptation to novel games with fine-tuning. Compared to existing methods in
behavioural cloning, online, and offline RL, MGDT offers the best scalability and performance.
Neural Information Processing Systems (NeurIPS), 2022 [Oral].
web |
paper |
blog |
code
We develop a new class of differentiable operators that leverage the underlying regularity of natural
and artificial objects at various scales. Neural Collages are a class of implicit operators that
discover self-similar representations of data and are efficient neural compressors, powerful decoders
in deep generative models, and may be used towards various creative and artistic generations.
Modern language models do in-context learning and show amazing abilities to zero-shot generalize to unseen conditions.
We represent compositions of prompted language models as probabilistic programs termed Model Cascades.
With this general framework for sampling and inference, we formalize prompting, reasoning, and tool use as graphical
models over random variables with complex string values.
Bayond Bayes: Paths Towards Universal Reasoning Systems @ ICML, 2022 [Spotlight].
web |
paper |
code
We explore the concept of infinite-dimensional stochastic variational inference
in learning the dynamics of continuous-time Neural ODEs with instantaneous noise. Our framework
trains Bayesian neural networks by parameterizing weight uncertainty with stochastic
differential equations (SDEs) and associate efficient gradient-based algorithms.
We also conjecture and derive zero variance gradient estimators to the infinite-dimensional case
as the approximate posterior converges to the true posterior.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
paper |
poster |
slides |
code |
talk
We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data
augmentation
that combines the best of interpolation based training and noise injection schemes. We use
noise-perturbed convex combinations of datapoint pairs in both input and feature space to learn
smoother decision boundaries, leading to improved robustness. The advantageous implicit
regularization
effects of NFM compared to previous mixup training methods are further understood with our
theoretical
insights. We show that residual networks and vision transfromers trained with NFM have favorable
trade-offs
between predictive accuracy and out-of-distribution generalization.
We introduce Goldilocks Selection, a technique for faster model training which selects a
sequence of training points that are "just right". We propose an information-theoretic
acquisition function -- the reducible held-out loss -- and compute it with a small proxy
model -- GoldiProx -- to efficiently choose training points that maximize information about a
validation set. We show that the selected sequence not only prioritizes learnable, yet
information rich data relevant to the evaluation task but also effectively transfers
across architectures and vision tasks.
International Conference on Machine Learning (ICML), 2022 [Spotlight].
paper |
poster
Exploring the Differential Effects of Sequencing Resolution
on Semi-Automated Genome Annotations Winnie Xu,
Francis
Nguyen,
Michael Hoffman
We validated the utility of two
novel
next-generation-sequencing assays (ChIP-Exo and ChIP-Nexus) in
epigenetic
exploration of cancer transcriptomes. I implemented
custom bioinformatics pipelines, unsupervised ML algorithms (Segway),
and the
stable
marriage
algorithm.
Princess Margaret Cancer Research Center 2018.
poster |
code
|
slides
This is how I've learned while having fun. I strive to maintain the same levels of inquisitiveness as I mature in my craft.
From data mining to open source web development, entrepreneurship to
basic science, I am grateful to have
spent my time gaining and practising new skills and making a positive
impact on myself and others.
With recent headlines showing multiple fatalities from vaping, we sought
to create
a solution that can mitigate nicotine addiction in Juul users. We
reverse engineered
the Juul to function with our Gaussian Process prediction model that can
analyze
the breathing
patterns and usage frequency of users, then dynamically adjust the
nicotine output
percentage
accordingly. What differentiates our product is the implementation of
gradual weaning,
where the percentage decrease is customized from user feedback.
Currently in conversation with the Centre for Addiction and Mental
Health to use this device
with patients.
As part of the 2019 ICLR Reproducibility
Challenge, we implemented this
paper
(later accepted) that
investigated a novel improvement to Equilibrium
Propagation, which is a method of energy based models (EBMs).
DOC is a tool for doctors to receive a second opinion on the diagnosis
of medical conditions directly through patient
interaction in real time through the analysis of the symptoms and
conversations they have with their patients.
Subsequently, DOC streamlines the recommendation for potential ailments
by providing further questions to ask in order
to generate an objective and accurate diagnosis that is wholistic in
scale.
1st Place BCGxGoogle Global Engineering Week Hackathon 2019
The SocialBit measures an individual’s social interactions throughout
the day. A glasses-mounted camera detects faces
and references them with an existing database of the user’s friends on
social media platforms. The data is then
visualized in a network plot and chord diagram that showcases the length
and location of one’s interaction, defined as
the points in time in which the friend’s face was within view. The
result
is a beautiful and insightful display of our
daily social interactions - at the microscale.