Parallel tempering on optimized paths
Authors: Saifuddin Syed, Vittorio Romaniello, Trevor Campbell, Alexandre Bouchard-Cote
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we study the empirical performance of non-reversible PT based on the spline annealing path family (K {2, 3, 4, 5, 10}) from Section 4, with knots and schedule optimized using the tuning method from Section 3. We compare this method to two PT methods based on standard linear paths: non-reversible PT with adaptive schedule ( NRPT+Linear ) (Syed et al., 2019), and reversible PT ( Reversible+Linear ) (Atchad e et al., 2011). Code for the experiments is available at https: //github.com/vittrom/PT-pathoptim. We run the following benchmark problems; see the supplement for details. Gaussian: a synthetic setup... Beta-binomial model... Galaxy data... High dimensional Gaussian... The results of these experiments are shown in Figures 3 and 4. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of British Columbia, Vancouver, Canada. Correspondence to: Saifuddin Syed <saif.syed@stat.ubc.ca>, Vittorio Romaniello <vittorio.romaniello@stat.ubc.ca>. |
| Pseudocode | Yes | Algorithm 1 NRPT Algorithm 2 Path Opt NRPT |
| Open Source Code | Yes | Code for the experiments is available at https: //github.com/vittrom/PT-pathoptim. |
| Open Datasets | Yes | Galaxy data: A Bayesian Gaussian mixture model applied to the galaxy dataset of (Roeder, 1990). |
| Dataset Splits | No | The paper describes the datasets used for experiments but does not specify train, validation, or test splits for data partitioning or model evaluation in a conventional sense for supervised learning tasks. The problems are sampling-based. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions software like Adagrad but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For this example we used N = 50 parallel chains and fixed the computational budget to 45000 samples. For Algorithm 2, the computational budget was divided equally over 150 scans, meaning 300 samples were used for every gradient update. The gradient updates were performed using Adagrad (Duchi et al., 2011) with learning rate equal to 0.2. In this experiment we used N = 35 chains and fixed the computational budget to 50000 samples, divided into 500 scans using 100 samples each. We optimized the path using Adagrad with a learning rate of 0.3. The number of chains N is set to increase with dimension at the rate N = 15 d . We fixed the number of spline knots K to 4 and set the computational budget to 50000 samples divided into 500 scans with 100 samples per gradient update. The gradient updates were performed using Adagrad with learning rate equal to 0.2. For all the experiments we performed one local exploration step before each communication step. |