Learning Unnormalized Statistical Models via Compositional Optimization
Authors: Wei Jiang, Jiayu Qin, Lingyu Wu, Changyou Chen, Tianbao Yang, Lijun Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the better performance of our method on different tasks, namely, density estimation, out-of-distribution detection, and real image generation. |
| Researcher Affiliation | Academia | 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2 Department of Computer Science and Engineering, University at Buffalo, New York, USA 3Department of Computer Science and Engineering, Texas A&M University, College Station, USA. Correspondence to: Changyou Chen <changyou@buffalo.edu>, Tianbao Yang <tianbaoyang@tamu.edu>, Lijun Zhang <zhanglj@lamda.nju.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 MECO Input: time step T, initial points (θ1, u1, v1) sequence {ηt, γt, βt} for time step t = 1 to T do Sampling zt from {x1, , xn} and ezt from q(x) Update estimator ut according to equation (3) Update estimator vt according to equation (4) Update the weight: θt+1 = θt ηtvt end for Choose τ uniformly at random from {1, . . . , T} Return θτ |
| Open Source Code | No | The paper does not provide a direct link to the source code for the methodology, nor does it explicitly state that the code is released or available. |
| Open Datasets | Yes | We choose CIFAR-10 (Krizhevsky, 2009) as the in-distribution data. |
| Dataset Splits | No | The paper mentions using training and testing data but does not explicitly provide details about validation splits or percentages for any of the datasets used. |
| Hardware Specification | Yes | Experiments on MNIST in Section 6.3 are trained on four NVIDIA Tesla V100 GPUs, and the training time is around 2.8 hours. |
| Software Dependencies | No | The paper mentions using `numpy` and `Adam` optimizer, but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For our method, we set the parameter γ = 0.1 and β = 0.9. For MCMC training, the number of sampling steps is searched from the set {20, 50, 100} and we use Langevin dynamics (Welling & Teh, 2011) as the sampling approach. For all tasks, we tune the learning rates from {1e 1, 1e 2, 1e 3, 1e 4} and pick the best one. |