Let's reflect on the latest scientific research, challenge the issues facing
global communities, and use technology to empower expansive reformation.
My research lies at the intersection of generative models, probabilistic inference, and
natural language understanding. My goal is to build interpretable and reliable generalist systems through
the incorporation of useful inductive biases and development of powerful learning / inference schemes.
Towards this, I explore ways to inject classical algorithms with differentiable components,
build efficient methods exploiting intermediate representations of stochastic processes, and improve reasoning
in large language models via modularity.
My recent work focuses on improving the robustness of neural differential equation
paradigms, developing methods in data efficient mini-batching for training LLMs, and
designing deep generative models to leverage compressible representations of periodic features.
As a biologist-turned deep learning scientist, I hope to democratize the fruits of machine learning to
other fields in natural sciences and engineering.
From exploring genetic editing and basic protein informatics, to now
developing new sequence learning algorithms or understanding of complex ML pipelines,
much of my work is about inferring properties of the physical world, exploring the
anomalies and uncertainties of observed phenomena, and reasoning about the underpinnings of
A longstanding goal in the field of AI is to find a strategy for compiling diverse experience into a highly capable, generalist agent.
In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets.
Motivated by this progress, we train one generalist control agent on a diverse set of offline data to play 46 Atari games.
We demonstrate scaling of performance in the model size and rapid adaptation to novel games with fine-tuning. Compared to existing methods in
behavioural cloning, online, and offline RL, MGDT offers the best scalability and performance.
Neural Information Processing Systems (NeurIPS), 2022 [Oral].
We develop a new class of differentiable operators that leverage the underlying regularity of natural
and artificial objects at various scales. Neural Collages are a class of implicit operators that
discover self-similar representations of data and are efficient neural compressors, powerful decoders
in deep generative models, and may be used towards various creative and artistic generations.
Modern language models do in-context learning and show amazing abilities to zero-shot generalize to unseen conditions.
We represent compositions of prompted language models as probabilistic programs termed Model Cascades.
With this general framework for sampling and inference, we formalize prompting, reasoning, and tool use as graphical
models over random variables with complex string values.
Bayond Bayes: Paths Towards Universal Reasoning Systems @ ICML, 2022 [Spotlight].
We explore the concept of infinite-dimensional stochastic variational inference
in learning the dynamics of continuous-time Neural ODEs with instantaneous noise. Our framework
trains Bayesian neural networks by parameterizing weight uncertainty with stochastic
differential equations (SDEs) and associate efficient gradient-based algorithms.
We also conjecture and derive zero variance gradient estimators to the infinite-dimensional case
as the approximate posterior converges to the true posterior.
We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data
that combines the best of interpolation based training and noise injection schemes. We use
noise-perturbed convex combinations of datapoint pairs in both input and feature space to learn
smoother decision boundaries, leading to improved robustness. The advantageous implicit
effects of NFM compared to previous mixup training methods are further understood with our
insights. We show that residual networks and vision transfromers trained with NFM have favorable
between predictive accuracy and out-of-distribution generalization.
We introduce Goldilocks Selection, a technique for faster model training which selects a
sequence of training points that are "just right". We propose an information-theoretic
acquisition function -- the reducible held-out loss -- and compute it with a small proxy
model -- GoldiProx -- to efficiently choose training points that maximize information about a
validation set. We show that the selected sequence not only prioritizes learnable, yet
information rich data relevant to the evaluation task but also effectively transfers
across architectures and vision tasks.
International Conference on Machine Learning (ICML), 2022 [Spotlight].
We validated the utility of two
next-generation-sequencing assays (ChIP-Exo and ChIP-Nexus) in
exploration of cancer transcriptomes. I implemented
custom bioinformatics pipelines, unsupervised ML algorithms (Segway),
This is how I've learned. I'm an inquisitive individual who always loves
From data mining to open source web development, entrepreneurship to
basic science, I am grateful to have
spent my time gaining and practising new skills to make a positive
impact on myself and others.
With recent headlines showing multiple fatalities from vaping, we sought
a solution that can mitigate nicotine addiction in Juul users. We
the Juul to function with our Gaussian Process prediction model that can
patterns and usage frequency of users, then dynamically adjust the
accordingly. What differentiates our product is the implementation of
where the percentage decrease is customized from user feedback.
Currently in conversation with the Centre for Addiction and Mental
Health to use this device
As part of the 2019 ICLR Reproducibility
Challenge, we implemented this
(later accepted) that
investigated a novel improvement to Equilibrium
Propagation, which is a method of energy based models (EBMs).
DOC is a tool for doctors to receive a second opinion on the diagnosis
of medical conditions directly through patient
interaction in real time through the analysis of the symptoms and
conversations they have with their patients.
Subsequently, DOC streamlines the recommendation for potential ailments
by providing further questions to ask in order
to generate an objective and accurate diagnosis that is wholistic in
1st Place BCGxGoogle Global Engineering Week Hackathon 2019
The SocialBit measures an individual’s social interactions throughout
the day. A glasses-mounted camera detects faces
and references them with an existing database of the user’s friends on
social media platforms. The data is then
visualized in a network plot and chord diagram that showcases the length
and location of one’s interaction, defined as
the points in time in which the friend’s face was within view. The
is a beautiful and insightful display of our
daily social interactions - at the microscale.