23 June 2017 - Chaya Stern - Prior distribution for variance parameters in hierarchical models

This week Chaya Stern will be presenting the paper "Prior distribution for variance parameters in hierarchical models". This paper discusses some of the problems when choosing prior distributions and makes several recommendations for priors on hierarchical variance parameters.  

http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf

 

Abstract:

Various noninformative prior distributions have been suggested for scale parameters in hierarchical models. We construct a new folded-noncentral-t family of conditionally conjugate priors for hierarchical standard deviation parameters, and then consider noninformative and weakly informative priors in this family. We use an example to illustrate serious problems with the inverse-gamma family of “noninformative” prior distributions. We suggest instead to use a uniform prior on the hierarchical standard deviation, using the half-t family when the number of groups is small and in other settings where a weakly informative prior is desired. We also illustrate the use of the half-t family for hierarchical modeling of multiple variance parameters such as arise in the analysis of variance.

16 June 2017 - Andrea Rizzi - A Common Derivation for Markov Chain Monte Carlo Algorithms with Tractable and Intractable Targets

This week, Andrea Rizzi will present a paper titled: "A Common Derivation for Markov Chain Monte Carlo Algorithms with Tractable and Intractable Targets". The paper was suggested to us by Lee Zamparo from Christina Leslie's few months ago. The paper lays down a general framework for MCMC algorithms, and subsumes as special cases

1) Metropolis-Hastings
2) Gibbs sampling
3) Metropolis-Hastings withing Gibbs sampling
4) Slice sampling
5) Directional sampling
6) Directional slice sampling
7) Langevin and Hamiltonian Monte Carlo sampling
8) Elliptical Hamiltonian Slice sampling
9) Pseudo Marginal Metropolis–Hastings
10) Pseudo Marginal Hamiltonian Slice sampling

We very likely won't have time to go through all of them, but let us know if there are some in which you are particularly interested!

Link: https://arxiv.org/abs/1607.01985 

Abstract:

Markov chain Monte Carlo is a class of algorithms for drawing Markovian samples from high dimensional target densities to approximate the numerical integration associated with computing statistical expectation, especially in Bayesian statistics. However, many Markov chain Monte Carlo algorithms do not seem to share the same theoretical support and each algorithm is proven in a different way. This incurs a large amount of terminologies and ancillary concepts, which makes Markov chain Monte Carlo literature seems to be scattered and intimidating to researchers from many other fields, including new researchers of Bayesian statistics.

A generalised version of the Metropolis–Hastings algorithm is constructed with a random number generator and a self–reverse mapping. This formulation admits many other Markov chain Monte Carlo algorithms as special cases. A common derivation for many Markov chain Monte Carlo algorithms is useful in drawing connections and comparisons between these algorithms. As a result, we now can construct many novel combinations of multiple Markov chain Monte Carlo algorithms that amplify the efficiency of each individual algorithm. Specifically, we reinterpret slice sampling as a special case of Metropolis–Hastings and then propose two novel sampling schemes that combine slice sampling with directional or Hamiltonian sampling. Our Hamiltonian slice sampling scheme is also applicable in the pseudo marginal context where the target density is intractable but can be unbiasedly estimated, e.g. using particle filtering.

9 June 2017 - Patrick Grinaway - Likelihood-free inference via classification

On Fri 9 Jun an 11am we'll be discussing "Likelihood-free inference via classification" by Gutmann, et al. (https://arxiv.org/pdf/1407.4981.pdf , abstract below) in the Z6 fishbowl. It should be an interesting discussion for anyone interested in inference of intractable generative models and approximate Bayesian inference in general, so if you can make it, be sure to come! 

Abstract:
Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.

2 June 2017 - Bas Rustenburg - discussing "Sequential Monte Carlo Sampling for DSGE models"

Bas will discuss the following paper:

http://onlinelibrary.wiley.com/doi/10.1002/jae.2397/abstract

 

SummaryWe develop a sequential Monte Carlo (SMC) algorithm for estimating Bayesian dynamic stochastic general equilibrium (DSGE) models; wherein a particle approximation to the posterior is built iteratively through tempering the likelihood. Using t…

Summary

We develop a sequential Monte Carlo (SMC) algorithm for estimating Bayesian dynamic stochastic general equilibrium (DSGE) models; wherein a particle approximation to the posterior is built iteratively through tempering the likelihood. Using two empirical illustrations consisting of the Smets and Wouters model and a larger news shock model we show that the SMC algorithm is better suited for multimodal and irregular posterior distributions than the widely used random walk Metropolis–Hastings algorithm. We find that a more diffuse prior for the Smets and Wouters model improves its marginal data density and that a slight modification of the prior for the news shock model leads to drastic changes in the posterior inference about the importance of news shocks for fluctuations in hours worked. Unlike standard Markov chain Monte Carlo (MCMC) techniques; the SMC algorithm is well suited for parallel computing. Copyright © 2014 John Wiley & Sons, Ltd

25 May 2017 - Jason Wagoner - Some tricks to reduce correlation times in GHMC simulations

Guest speaker Jason Wagoner, a Junior Laufer Fellow from the Laufer Center at Stony Brook, will be speaking.Although his expertise comes largely from molecular simulation, powerful algorithms that can be used in more general statistical settings will be discussed, so if you can, be sure to attend!

Some tricks to reduce correlation times in GHMC simulations

Abstract:

The distribution sampled by a molecular dynamics algorithm is subject to some amount of error that depends on the size of the integration timestep. This error can be corrected by updating the system with a Metropolis Monte Carlo criterion, where the integration step is treated as a selection probability for the candidate state. These methods, closely related to generalized hybrid Monte Carlo (GHMC), satisfy detailed balance by imposing momenta reversal upon candidate rejection. Unfortunately, these momentum reversals can severely increase the time needed for decorrelation, sometimes giving an order-of-magnitude increase in correlation times for system variables. Here, I present the reduced-flipping GHMC algorithm. The algorithm rigorously samples the target distribution but breaks detailed balance to reduce the number of momentum flips in the GHMC simulation. I will also present similar methods in the field that have the same goal to reduce correlation times--extra-chance GHMC and look-ahead GHMC. 

19 May 2017 - Daniela Huppenkothen - Timing Black Holes: Time Series Analysis in High-Energy Astronomy

Daniela Huppenkothen from NYU will be presenting this Friday, 5/19 in Z-679 at the computational statistics club. The title of her talk is "Timing Black Holes: Time Series Analysis in High-Energy Astronomy".  This will be an interesting discussion on time series analysis in a data limiting regime!   

 "Timing Black Holes: Time Series Analysis in High-Energy Astronomy"

The sky in X-rays is incredibly dynamic. Black holes vary on time scales ranging from milli-seconds to decades, their brightness occasionally changing by several orders of magnitude within seconds or minutes. Studying this variability is one of the best ways to understand key physical processes that are unobservable on Earth: general relativity in strong gravity, extremely dense matter and the strongest magnetic fields known to us are just a few examples.

However, astronomical time series can be difficult to analyze in practice: many of the time series are inherently non-stationary, observing constraints lead to a very uneven sampling, and the underlying process is often partly stochastic in nature. Furthermore, classification problems are complicated by the fact that we are very limited by our relatively small, imbalanced data sets.

In this talk, I will give an overview of the state-of-the-art of time series analysis in high-energy astronomy. I will present key statistical methods and machine learning models we have been developing recently as well as point out opportunities and the many challenges of the spectral-timing revolution we are moving toward with data from current and future space missions.

12 May 2017 - Alpha Lee - Exploring chemical space by undressing finite sampling noise

Alpha Lee from Harvard will be presenting a talk entitled "Exploring Chemical Space by Undressing Finite Sampling Noise." Whether you are interested in chemistry or just machine learning, it will certainly be an interesting discussion!

Exploring chemical space by undressing finite sampling noise

Developing computational methods to explore chemical space is a major challenge for drug discovery and material discovery. The challenge is often the limited number of experimental measurements relative to the vast chemical space. I will discuss a mathematical framework, inspired by random matrix theory, which allows us to remove noise due to finite sampling and identify important chemical features. I will illustrate this framework with two examples: predicting protein-ligand affinity [1], and optimal design of experiments by combining coarse and fine measurements [2]. 

[1] A. A. Lee, M. P. Brenner and L. J. Colwell, Proc. Natl. Acad. Sci. U.S.A., 113, 13564 (2016)
[2] A. A. Lee, M. P. Brenner and L. J. Colwell, arXiv:1702.06001

 

28 Apr 2017 - Rajesh Ranganath - Implicit Models and Posterior Approximations.

Just a reminder that this Friday (4/28) at 11:00am in the Z6 fishbowl, we'll have a guest, Rajesh Ranganath, presenting on "Implicit Models and Posterior Approximations." (abstract below). For those who are unfamiliar, Rajesh has done a lot of interesting work on probabilistic generative modeling and variational inference, so if you're interested in machine learning, definitely plan to attend--it should be a great talk.

Implicit Models and Posterior Approximations.

Abstract:
Probabilistic generative models tell stories about how data were generated. These stories uncover hidden patterns (latent states) and form the basis for predictions. Traditionally, probabilistic generative models provide a score for generated samples via a tractable likelihood function. The requirement of the score limits the flexibility of these models. For example, in many physical models we can generate samples, but not compute their likelihood --- such models defined only by their sampling process are called implicit models. In the first part of the talk I will present a family of implicit models that combine hierarchical Bayesian models with deep models. The main computational task in working with probabilistic generative models is computing the distribution of the latent states given data: posterior inference. Posterior inference cast as optimization over an approximating family is variational inference. The accuracy of variational inference hinges on the expressivity of the approximating family. In the second part of this talk, I will present multiple types of implicit variational approximations for both traditional and implicit models. Along the way, we'll explore models for text and regression.

31 Mar 2017 - Patrick Grinaway - Parallel resampling in the particle filter

Sorry for the late email, but we'll be having a journal club tomorrow, 3/31, in the Z6 fishbowl, at 11am. We'll be discussing the paper "Parallel resampling in the particle filter." (abstract below). We'll explore what particle filtering is good for, as well as an sometimes-overlooked aspect of computational statistics: how to best design an algorithm not just for low error, but also for efficient computation on specific hardware.

Parallel resampling in the particle filter

Abstract:

Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through repeated propagation, weighting and resampling steps. The propagation and weighting steps are straightforward to parallelise, as they require only independent operations on each particle. The resampling step is more difficult, as standard schemes require a collective operation, such as a sum, across particle weights. Focusing on this resampling step, we analyse two alternative schemes that do not involve a collective operation (Metropolis and rejection resamplers), and compare them to standard schemes (multinomial, stratified and systematic resamplers). We find that, in certain circumstances, the alternative resamplers can perform significantly faster on a GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover, in single precision, the standard approaches are numerically biased for upwards of hundreds of thousands of particles, while the alternatives are not. This is particularly important given greater single- than double-precision throughput on modern devices, and the consequent temptation to use single precision with a greater number of particles. Finally, we provide auxiliary functions useful for implementation, such as for the permutation of ancestry vectors to enable in-place propagation.

10th Feb 2017-Bayesian update method for adaptive weighted sampling

The paper can be found here.

Abstract

Exploring conformational spaces is still a challenging task for simulations of complex systems. One way to enhance such a task is weighted sampling, e.g., by assigning high weights to regions that are rarely sampled. It is, however, difficult to estimate adequate weights beforehand, and therefore adaptive methods are desired. Here we present a method for adaptive weighted sampling based on Bayesian inference. Within the framework of Bayesian inference, we develop an update scheme in which the information from previous data is stored in a prior distribution which is then updated to a posterior distribution according to new data. The method proposed here is particularly well suited for distributed computing, in which one must deal with rapid influxes of large amounts of data.

3 Feb 2017 - Deep Unsupervised Learning using Nonequilibrium Thermodynamics


Computational Statistics Club is back after a brief hiatus, at its usual time (11:00am-12:00pm Friday, February 3, 2017) and place (Z6 fishbowl). Patrick Grinaway be presenting the paper below this week. Briefly, this paper adds to the growing set of methods that allow us to learn so-called deep generative models, with an interesting twist that also maintains tractability. If this idea sounds cool (or you just want to learn about it), join us at the CSC this week to discuss!

https://arxiv.org/abs/1503.03585

"Deep Unsupervised Learning using Nonequilibrium Thermodynamics"

Abstract: A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.

Schedule for December 2016

December 2 : Theo Karaletsos, Geometric Intelligence

'Adversarial Message Passing For Graphical Models'.

A currently popular technique for learning generative models is generative adversarial networks (GANs). They form a basis to learning generative models by learning to discriminate true samples versus fake ones to guide a model towards good solutions that can fool a strong discriminator into assigning high probability of being true to model samples. It has been shown that GANs minimize a well-defined f-divergence, the Jensen-Shannon Divergence, between the model distribution and the data distribution.

However, current best practices have a number of shortcomings.

Typically, GANs are considered to be models and are not understood in the context of inference. In addition, current techniques rely on global discrimination of joint distributions to perform learning, which is ineffective.

We propose to alleviate this limitation by showing how to relate adversarial learning to distributed approximate Bayesian inference on factor graphs. We propose local learning rules based on message passing which minimize a global variational criterion based on adversaries used to score ratios of distributions instead of explicit likelihood evaluations. 

This yields an inference and learning framework that facilitates treating model specification and inference separately by combining ideas from message passing with adversarial inference and can be used on arbitrary computational structures within the family of Directed Acyclic Graphs and models, including intractable likelihoods, non-differentiable models and generally cumbersome models.

We thus present adversarial learning under the viewpoint of approximate inference and modeling. We combine adversarial learning with nonparametric variational families to yield a learning framework which performs implicit Bayesian Inference on graph structures by sampling particles, without the need to evaluate densities.

These approaches hold promise to be useful in the toolbox of probabilistic modelers and have the potential to enrich the gamut of flexible probabilistic programming applications beyond current practice.

To be presented at NIPS Advances In Approximate Inference 2016

Schedule for November 2016

November 25 No CSC due to Thanksgiving holiday

November 18 No CSC

November 11 Patrick Grinaway - A Kernel Test of Goodness of Fit

https://arxiv.org/abs/1602.02964

This paper attempts to provide a solution for a common problem in our field: we have drawn (usually correlated) samples using some algorithm, and we want to make sure that the samples have actually come from the appropriate target. However, we are often in the regime where we don't have access to the normalized target probability density. This method is a nonparametric statistical test of whether the samples came from the appropriate density, without requiring the normalized target density.

Abstract:

We propose a nonparametric statistical test for goodness-of-fit: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein's method using functions from a Reproducing Kernel Hilbert Space. Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel. We derive a statistical test, both for i.i.d. and non-i.i.d. samples, where we estimate the null distribution quantiles using a wild bootstrap procedure. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation.

November 3 Josh Fass - Operator Variational Inference

https://arxiv.org/abs/1610.09033

Variational inference turns inference into optimization: you try to find the closest tractable distribution to your target distribution. Variational inference is limited by how you define "closest" and "tractable."

The paper describes a generalization of variational inference that allows you greater flexibility.

For example, the authors show how to optimize "variational programs," where your approximating distribution is a procedure for sampling, but doesn't have a tractable density. An example of a variational program is: (1) draw a sample from a Gaussian, (2) push it forward through a neural network. By setting the parameters of the neural network appropriately, you can approximate basically any distribution. In this case, "operator variational inference" would give you a recipe for fiddling with the neural network parameters so that the resulting distribution better approximates the target distribution.

Before the meeting, you may also be interested in checking out Shakir's blog post on reparameterization tricks: http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/

 

 

Schedule for September 2016

September 30 Chaya Stern - A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

September 23Anand Sarwate (Rutgers) - High Dimensional Inference with Random Maximum A-Posteriori Perturbations

September 9Bas - Why Bayesian Psychologists Should Change the Way they Use the Bayes Factor

September 2Patrick - Reversible Jump MCMC

Schedule for August 2016

 

August 26 Josh - affine invariant ensemble MCMC

August 19 Josh & Bas - NCMC discussion

August 12 Chaya - Fitting the correlation function

August 5 Lee - A Common Derivation for Metropolis-Hastings and other Markov Chain Monte Carlo Algorithms

  • http://arxiv.org/abs/1607.01985

 

Updated schedule

We organize a weekly chalk-talk / discussion club (meetings are typically at 11AM on Friday in the Z6 Fishbowl) focused on computational statistics methods. 

Schedule

August 12 Chaya - Fitting the correlation function

August 5 Lee - A Common Derivation for Metropolis-Hastings and other Markov Chain Monte Carlo Algorithms

  • http://arxiv.org/abs/1607.01985

July 29 Bas - Bayes statistical decisions with random fuzzy data—an application in reliability

July 22 Cancelled.

July 15 Bas - Bayesian estimation of non-Gaussianity in pulsar timing analysis

July 12 Chaya - Annealed importance sampling (AIS)  Note: Zuckerman, 19th floor conference room, 11am-12 pm

July 1 Greg - Continous Contour Monte Carlo

June 24 Andrea - Multistate Bennett Accceptance Ratio

June 17 Josh - Merging MCMC Subposteriors through Gaussian-Process Approximations

June 10 Chaya - Detecting parameter symmetries in probabilistic models

  • http://arxiv.org/pdf/1312.5386.pdf

June 3 Open discussion - Force field parameterization methods and strategies

May 20 Bas - Delayed Rejection methods in MCMC

  • http://arxiv.org/pdf/0904.2207v2.pdf
  • http://www.jstor.org/stable/pdf/2673700.pdf

May 13 Patrick - SAMS optimality proof

  • SAMS paper - http://www.stat.rutgers.edu/home/ztan/Publication/SAMS_redo4.pdf
  • SAMS supplement (with proofs) - http://www.stat.rutgers.edu/home/ztan/Publication/SAMS_redo4_supp_Part2-print.pdf

May 6 Bas - Multiple-Try Metropolis

  • http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_PandolfiBF10.pdf
  • http://arxiv.org/pdf/1201.0646.pdf

April 29 Patrick - Particle MCMC

April 22 Greg - Wang-Landau

April 15 Josh - Hamiltonian Monte Carlo

  • Neal: http://arxiv.org/pdf/1206.1901.pdf

April 8 Andrea - Self-adjusted mixture sampling

  •  http://www.stat.rutgers.edu/home/ztan/Publication/SAMS_redo4.pdf

April 1 Lee - Importance weighted autoencoders

  • http://arxiv.org/abs/1312.6114 and http://arxiv.org/abs/1509.00519

March 25 Josh - Markov chains

  • Jun Liu's book: Chapter 12

March 11 Patrick -  Transport map accelerated MCMC

  • http://arxiv.org/abs/1412.5492

March 4 Bas - Firefly MC

  • http://arxiv.org/pdf/1403.5693v1.pdf

Feb 26 ---[canceled]---

Feb 18 Patrick - Likelihood approach to integration

  • http://stat.rutgers.edu/home/ztan/Publication/Kong-et-al-2003.pdf
  • http://stat.rutgers.edu/home/ztan/Publication/likmcDec04.pdf
  • http://stat.rutgers.edu/home/ztan/Publication/armcSept06.pdf
  • http://stat.rutgers.edu/home/ztan/Publication/mcmcJuly08.pdf

Feb 12 Josh - Intro to MCMC

  • http://www.inference.phy.cam.ac.uk/mackay/itila/

Feb 5  All -  Kickoff

Specific themes of interest

Some specific themes we're interested in include:

  • How can we best use samples once they've been collected?
    • What estimators are more appropriate than the crude MC estimator for estimating expectations?
    • How can such estimators be constructed, analyzed, and applied?
  • How do we know when we've sampled enough?
    • What MCMC convergence diagnostics are available, and when can we trust them?
  • Which MCMC algorithm when?
    • An extremely wide variety of sampling algorithms have been developed, often targeting a specific type of pathology (e.g. highly skewed distributions, local correlation structures, etc.). How can we systematically diagnose which sampling algorithm is most appropriate for a given sampling problem?
  • Testing MCMC implementations
    • Since MCMC is often the only feasible way to compute a given quantity and its output is stochastic, how can we test that our implementations are correct?
  • Hybrid Monte Carlo and molecular dynamics
    • Which methods are most efficient for sampling conformational distributions of large solvated systems?
  • Nonequilibrium methods
    • How can we use nonequilibrium fluctuation theorems to analyze and correct time-discretized SDEs?