Let's reflect on the latest scientific research, challenge our assumptions when designing implementation, and use technology to empower expansive reformation.
My research lies at the intersection of generative models, information retrieval, and
natural language understanding. My goal is to build interpretable and scalable generative systems through
the incorporation of useful inductive biases, compression, and powerful learning / inference schemes.
Towards this, I explore ways to inject classical algorithms with differentiable components,
build efficient methods exploiting intermediate representations, and improve reasoning capabilities
of large language models via modularity.
I studied Artifical Intelligence Computer Science, Statistics and Math
at the University of Toronto.
My AI research journey began by scaling methods in latent variable modeling and probabilistic inference with David Duvenaud (Vector Institute).
I hope to one day bring similar influence with my ideas at the cutting edge.
My recent work focuses on improving the robustness of neural differential equation
paradigms, developing methods in data efficient mini-batching for training LLMs, and
designing deep generative models that leverage compressible representations of periodic features.
As a biologist-turned deep learning scientist, I hope to democratize the fruits of machine learning to natural sciences and engineering.
Whether in computational genetics or ML, much of my work is about inferring properties of the physical world, exploring the
anomalies and uncertainties of observed phenomena, and reasoning about the underpinnings of
An interesting problem in representation learning is training on an optimal index that is helpful towards the learning task.
We explore the impacts of sequentially compressing an unordered set of assets as opposed to random priority assignment.
Unfortunately, general set compression methods show minimal gains in terms of rate-distortion tradeoff when used with our
proposed associative compression architectures.
A longstanding goal in the field of AI is to find a strategy for compiling diverse experience into a highly capable, generalist agent.
In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets.
Motivated by this progress, we train one generalist control agent on a diverse set of offline data to play 46 Atari games.
We demonstrate scaling of performance in the model size and rapid adaptation to novel games with fine-tuning. Compared to existing methods in
behavioural cloning, online, and offline RL, MGDT offers the best scalability and performance.
Neural Information Processing Systems (NeurIPS), 2022 [Oral].
We develop a new class of differentiable operators that leverage the underlying regularity of natural
and artificial objects at various scales. Neural Collages are a class of implicit operators that
discover self-similar representations of data and are efficient neural compressors, powerful decoders
in deep generative models, and may be used towards various creative and artistic generations.
Modern language models do in-context learning and show amazing abilities to zero-shot generalize to unseen conditions.
We represent compositions of prompted language models as probabilistic programs termed Model Cascades.
With this general framework for sampling and inference, we formalize prompting, reasoning, and tool use as graphical
models over random variables with complex string values.
Bayond Bayes: Paths Towards Universal Reasoning Systems @ ICML, 2022 [Spotlight].
We explore the concept of infinite-dimensional stochastic variational inference
in learning the dynamics of continuous-time Neural ODEs with instantaneous noise. Our framework
trains Bayesian neural networks by parameterizing weight uncertainty with stochastic
differential equations (SDEs) and associate efficient gradient-based algorithms.
We also conjecture and derive zero variance gradient estimators to the infinite-dimensional case
as the approximate posterior converges to the true posterior.
We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data
that combines the best of interpolation based training and noise injection schemes. We use
noise-perturbed convex combinations of datapoint pairs in both input and feature space to learn
smoother decision boundaries, leading to improved robustness. The advantageous implicit
effects of NFM compared to previous mixup training methods are further understood with our
insights. We show that residual networks and vision transfromers trained with NFM have favorable
between predictive accuracy and out-of-distribution generalization.
We introduce Goldilocks Selection, a technique for faster model training which selects a
sequence of training points that are "just right". We propose an information-theoretic
acquisition function -- the reducible held-out loss -- and compute it with a small proxy
model -- GoldiProx -- to efficiently choose training points that maximize information about a
validation set. We show that the selected sequence not only prioritizes learnable, yet
information rich data relevant to the evaluation task but also effectively transfers
across architectures and vision tasks.
International Conference on Machine Learning (ICML), 2022 [Spotlight].
We validated the utility of two
next-generation-sequencing assays (ChIP-Exo and ChIP-Nexus) in
exploration of cancer transcriptomes. I implemented
custom bioinformatics pipelines, unsupervised ML algorithms (Segway),
This is how I've learned while having fun. I strive to maintain the same levels of inquisitiveness as I mature in my craft.
From data mining to open source web development, entrepreneurship to
basic science, I am grateful to have
spent my time gaining and practising new skills and making a positive
impact on myself and others.
With recent headlines showing multiple fatalities from vaping, we sought
a solution that can mitigate nicotine addiction in Juul users. We
the Juul to function with our Gaussian Process prediction model that can
patterns and usage frequency of users, then dynamically adjust the
accordingly. What differentiates our product is the implementation of
where the percentage decrease is customized from user feedback.
Currently in conversation with the Centre for Addiction and Mental
Health to use this device
As part of the 2019 ICLR Reproducibility
Challenge, we implemented this
(later accepted) that
investigated a novel improvement to Equilibrium
Propagation, which is a method of energy based models (EBMs).
DOC is a tool for doctors to receive a second opinion on the diagnosis
of medical conditions directly through patient
interaction in real time through the analysis of the symptoms and
conversations they have with their patients.
Subsequently, DOC streamlines the recommendation for potential ailments
by providing further questions to ask in order
to generate an objective and accurate diagnosis that is wholistic in
1st Place BCGxGoogle Global Engineering Week Hackathon 2019
The SocialBit measures an individual’s social interactions throughout
the day. A glasses-mounted camera detects faces
and references them with an existing database of the user’s friends on
social media platforms. The data is then
visualized in a network plot and chord diagram that showcases the length
and location of one’s interaction, defined as
the points in time in which the friend’s face was within view. The
is a beautiful and insightful display of our
daily social interactions - at the microscale.