Change of variables for normalizing flows
TL;DR: The change of variables formula lets us compute tractable densities in normalizing flow models
My friend Aditya is teaching a new class at Stanford this quarter on deep generative models. While going over lectures, we realized it would be helpful to have a simple example for the change of variables formula, which is crucial for understanding normalizing flow models.
There’s been a lot of work on normalizing flows in the last few years, including NICE (Dinh, Krueger, & Bengio, 2014), RealNVP (Dinh, SohlDickstein, & Bengio, 2016), Inverse Autoregressive Flows (IAF) (Kingma et al., 2016), and Masked Autoregressive Flows (MAF) (Papamakarios, Murray, & Pavlakou, 2017). Check out Eric Jang’s tutorial (Part I, Part II) for a great introduction — here we’ll just dive into the change of variables example.
Probability mass is conserved
Let’s start with a random variable that is uniformly distributed over the unit cube, . We can scale by a factor of 2 to get a new random variable ,
where is uniform over a cube with side length 2:
How is the density related to ?
Since every distribution sums to 1 and the unit cube has volume ,
and for all in the unit cube.
The volume of the larger cube is easy to compute: . The total probability mass must be conserved, so we can solve for the density of :
The new density is equal to the original density multiplied by the ratio of the volumes. Intuitively, this scaling factor tells us whether the volume is expanding (as in our example, where ratio < 1) or shrinking (ratio > 1).
Change of variables formula
The change of variables formula allows us to tractably compute normalized probability densities when we apply an invertible transformation :^{1}
In our example, the invertible function is just multiplication by a scaling matrix, so the determinant of the Jacobian matrix is easy to compute:
For any invertible function, the absolute value of the Jacobian determinant is a linear approximation for how much the function is locally expanding or shrinking the volume, and corresponds to the ratio of volumes in our simple example.
Footnotes

The determinant of an invertible matrix is equal to the inverse of the determinant of the matrix inverse. ↩
References
 Dinh, L., Krueger, D., & Bengio, Y. (2014). NICE: Nonlinear independent components estimation. ArXiv Preprint ArXiv:1410.8516.
 Dinh, L., SohlDickstein, J., & Bengio, S. (2016). Density estimation using Real NVP. ArXiv Preprint ArXiv:1605.08803.
 Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems (pp. 4743–4751).
 Papamakarios, G., Murray, I., & Pavlakou, T. (2017). Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems (pp. 2338–2347).