**November 25** *No CSC due to Thanksgiving holiday*

**November 18** *No CSC*

**November 11** *Patrick Grinaway - A Kernel Test of Goodness of Fit*

https://arxiv.org/abs/1602.02964

This paper attempts to provide a solution for a common problem in our field: we have drawn (usually correlated) samples using some algorithm, and we want to make sure that the samples have actually come from the appropriate target. However, we are often in the regime where we don't have access to the normalized target probability density. This method is a nonparametric statistical test of whether the samples came from the appropriate density, without requiring the normalized target density.

Abstract:

We propose a nonparametric statistical test for goodness-of-fit: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein's method using functions from a Reproducing Kernel Hilbert Space. Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel. We derive a statistical test, both for i.i.d. and non-i.i.d. samples, where we estimate the null distribution quantiles using a wild bootstrap procedure. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation.

**November 3** *Josh Fass - Operator Variational Inference*

https://arxiv.org/abs/1610.09033

Variational inference turns inference into optimization: you try to find the closest tractable distribution to your target distribution. Variational inference is limited by how you define "closest" and "tractable."

The paper describes a generalization of variational inference that allows you greater flexibility.

For example, the authors show how to optimize "variational programs," where your approximating distribution is a procedure for sampling, but doesn't have a tractable density. An example of a variational program is: (1) draw a sample from a Gaussian, (2) push it forward through a neural network. By setting the parameters of the neural network appropriately, you can approximate basically any distribution. In this case, "operator variational inference" would give you a recipe for fiddling with the neural network parameters so that the resulting distribution better approximates the target distribution.

Before the meeting, you may also be interested in checking out Shakir's blog post on reparameterization tricks: http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/