Latent Diffusion Energy-Based Model for Interpretable Text Modelling

Authors: Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, Ying Nian Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts. Through a series of experiments, we empirically examine the capability of our model for generative modeling and interpretability on text modeling tasks.
Researcher Affiliation Collaboration 1Department of Computer Science, UCLA, USA 2Beijing Institute for General Artificial Intelligence, China 3Salesforce Research, USA 4Google Brain, USA 5Institute for Artificial Intelligence, Peking University, China 6School of Artificial Intelligence, Peking University, China 7Department of Statistics, UCLA, USA 8Department of Automation, Tsinghua University, China.
Pseudocode Yes Algorithm 1 Learning algorithm. input: initial parameters (α, β, ϕ), learning rate η, observed unlabeled examples {x(i)}M i=1, observed labeled examples {(x(i), y(i))}M+N i=M+1 (alternative, needed in controllable generation or semi-supervised learning). repeat... Algorithm 2 Synthesizing algorithm. input: z T N(0, I) output: z0 for t = T 1 to t = 0 do...
Open Source Code Yes Code repo and data: https://github.com/yu Peiyu98/LDEBM.
Open Datasets Yes Penn Treebanks (PTB) (Marcus et al., 1993), Daily Dialog (DD) dataset (Li et al., 2017b), Stanford Multi-Domain Dialog (SMD) (Eric et al., 2017), Yelp reviews, pre-processed by (Li et al., 2018), AGNews (Zhang et al., 2015)
Dataset Splits No The paper mentions using well-known datasets like PTB, DD, SMD, Yelp, and AGNews. However, it does not explicitly provide the specific percentages or absolute counts for the train/validation/test splits used for these datasets, nor does it refer to predefined splits with citations in a way that allows direct reproduction of the data partitioning.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory amounts used for running the experiments. It only mentions general training parameters and dataset sizes.
Software Dependencies No The paper mentions using the Adam optimizer and GRU, but does not provide specific version numbers for programming languages, libraries, or other software dependencies. It cites external implementations but doesn't list their versions within the experimental setup.
Experiment Setup Yes For Langevin dynamics, we use K = 50 and b2 = 0.002 throughout the experiments. For optimization, we use Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.999 for all the experiments. On all the datasets but 2D synthetic datasets and AGNews dataset, we use a batch size of 128 and a constant learning rate of 1e 3 for encoder and decoder without weight decay. For LDEBM, we use a constant learning rate of 1e 4. We use a larger batch size of 1000 on 2D synthetic datasets. On the AGNews dataset, we use the same set of hyperparameters as in Pang & Wu (2021) for optimization. The batch size is set to 200; the initial learning rate is 1e 4 for encoder and decoder, and 1e 5 for LDEBM. Learning rates are exponentially decayed with a decay rate of 0.998 for each model. Encoder and LDEBM have a weight decay rate of 2e 3 and 1e 3, respectively.