Learning Unnormalized Statistical Models via Compositional Optimization

Authors: Wei Jiang, Jiayu Qin, Lingyu Wu, Changyou Chen, Tianbao Yang, Lijun Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the better performance of our method on different tasks, namely, density estimation, out-of-distribution detection, and real image generation.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2 Department of Computer Science and Engineering, University at Buffalo, New York, USA 3Department of Computer Science and Engineering, Texas A&M University, College Station, USA. Correspondence to: Changyou Chen <changyou@buffalo.edu>, Tianbao Yang <tianbaoyang@tamu.edu>, Lijun Zhang <zhanglj@lamda.nju.edu.cn>.
Pseudocode Yes Algorithm 1 MECO Input: time step T, initial points (θ1, u1, v1) sequence {ηt, γt, βt} for time step t = 1 to T do Sampling zt from {x1, , xn} and ezt from q(x) Update estimator ut according to equation (3) Update estimator vt according to equation (4) Update the weight: θt+1 = θt ηtvt end for Choose τ uniformly at random from {1, . . . , T} Return θτ
Open Source Code No The paper does not provide a direct link to the source code for the methodology, nor does it explicitly state that the code is released or available.
Open Datasets Yes We choose CIFAR-10 (Krizhevsky, 2009) as the in-distribution data.
Dataset Splits No The paper mentions using training and testing data but does not explicitly provide details about validation splits or percentages for any of the datasets used.
Hardware Specification Yes Experiments on MNIST in Section 6.3 are trained on four NVIDIA Tesla V100 GPUs, and the training time is around 2.8 hours.
Software Dependencies No The paper mentions using `numpy` and `Adam` optimizer, but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes For our method, we set the parameter γ = 0.1 and β = 0.9. For MCMC training, the number of sampling steps is searched from the set {20, 50, 100} and we use Langevin dynamics (Welling & Teh, 2011) as the sampling approach. For all tasks, we tune the learning rates from {1e 1, 1e 2, 1e 3, 1e 4} and pick the best one.