\[\renewcommand{\V}[1]{\mathbf{#1}}\]One of the must-see papers of 2022. Finally, but as expected, it's here: decision-making formulated not as reinforcement learning (RL), but as conditional generative modeling. Personally, I am doubly interested since it also runs experiments on robotic simulation. I wonder if it can solve complexities of traditional RL methods.
Introduction
Amazing paper from Improbable AI Lab MIT: (Ajay et al., 2022) .
By viewing decision-making not as reinforcement learning (RL) problem, but as a conditional generative modeling problem:
- Conditional generative modeling is an effective tool in offline decision making
- Using classifier-free guidance with low-temperature sampling, instead of dynamic programming
- Leveraging the framework of conditional gererative modeling to combine constraints and compose skills during inference flexibly.
Decision Diffuser
Results Overview
Constraint Satisfaction
Combining Stacking Constraints
![]() | ![]() | ![]() |
Combining Rearrangement Constraints
![]() | ![]() | ![]() |
‘Not’ constraints in Stacking and Rearrangement
![]() | ![]() |
Infeasible constraints lead to incoherent behavior
Background
Diffusion Probabilistic Models
Diffusion models (Sohl-Dickstein et al., 2015) , (Ho et al., 2020) are a specific type of generative model that learn the data distribution \(q(\V x)\) from a dataset \(\mathcal{D} := \{ \V x^i \}_{0 \leq i < M}\) . They have been used most notably for synthesizing high-quality images from text descriptions. Here the data-generating procedure is modelled with a predefined forward noising process \(q(\V x_{k-1} \mid \V x_k) := \mathcal{N}(\V x_{k+1}; \sqrt{\alpha_k} \V x_k, (1-\alpha_k) \V I)\) and a trainable reverse process \(p_\theta(\V x_{k-1} \mid \V x_k) := \mathcal{N} (\V x_{k-1} \mid \mu_\theta (\V x_k, k), \Sigma_k)\), where \(\mathcal{N}(\mu, \Sigma)\) denotes a Gaussian distribution with mean \(\mu\) and variance \(\Sigma\), \(\alpha_k \in \mathbb{R}\) determines the variance schedule, \(\V x_0 := \V x\) is a sample, \(\V x_1, \V x_2, \ldots, \V x_{K-1}\) are the latents, and \(\V x_K \sim \mathcal{N}(\V 0, \V I)\) for carefully chosen \(\alpha_k\) and long enough \(K\). Starting with Gaussian noise, sample are then iteratively generated through a series of “denoising” steps.
Although a tractable variational lower-bound on \(\log p_\theta\) can be optimized to train diffusion models, (Ho et al., 2020) propose a simplified surrogate loss:
\[\mathcal{L}_{\mathrm {denoise}}(\theta) := \mathbb{E}_{k \sim [1, K], \V x_0\sim q, \epsilon \sim \mathcal{N}(\V 0, \V I)}[\| \epsilon - \epsilon_0 (\V x_k, k)\|^2]\]The predicted noise \(\epsilon_\theta(\V x_k, k)\), parameterized with a deep neural network, estimates the noise \(\epsilon \sim \mathcal{N}(0, I)\) added to the dataset sample \(\V x_0\) to produce a noisy \(\V x_k\). This is equivalent to predicting the mean of \(p_\theta(\V x_{k-1} \mid \V x_k)\) since \(\mu_\theta(\V x_k, k)\) can be calculated as a function of \(\epsilon_\theta(\V x_k, k)\) (Ho et al., 2020) .
Connection with stochastic gradient Langevin dynamics
Langevin dynamics is a concept from physics, developed for statistically modeling molecular systems. Combined with stochastic gradient descent, stochastic gradient Langevin dynamics (Welling & Teh, 2011) can produce samples from a probability density \(p(\V x)\) using only the gradients \(\nabla_{\V x} \log p(\V x)\) in a Markov chain of updates:
\[\V x_t = \V x_{t-1} + \frac{\delta}{2} \nabla_{\V x} \log q(\V x_{t-1}) + \sqrt{\delta} \V \epsilon_t, \quad\text{where } \epsilon \sim \mathcal N (\V 0, \V I)\]where \(\delta\) is the step size. When \(T \rightarrow \infty\), \(\epsilon \to 0\), \(\V x_T\) equals to the true probability density \(p(\V x)\).
Compared to standard SGD, stochastic gradient Langevin dynamics injects Gaussian noise into the parameter updates to avoid collapses into local minima.1
References
- Ajay, A., Du, Y., Gupta, A., Tenenbaum, J., Jaakkola, T., & Agrawal, P. (2022). Is Conditional Generative Modeling all you need for Decision-Making? arXiv. doi: 10.48550/ARXIV.2211.15657 https://arxiv.org/abs/2211.15657
- Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics. CoRR, abs/1503.03585. http://arxiv.org/abs/1503.03585
- Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol. 33, pp. 6840–6851). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
- Welling, M., & Teh, Y. W. (2011). Bayesian Learning via Stochastic Gradient Langevin Dynamics. International Conference on Machine Learning.
Footnotes
https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ ↩