Gradient-Guided Importance Sampling for Learning Binary Energy-Based Models
Authors: Meng Liu, Haoran Liu, Shuiwang Ji
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments on density modeling over synthetic discrete data, graph generation, and training Ising models to evaluate our proposed method. The experimental results demonstrate that our method can significantly alleviate the limitations of ratio matching, perform more effectively in practice, and scale to high-dimensional problems. |
| Researcher Affiliation | Academia | Meng Liu, Haoran Liu, Shuiwang Ji Department of Computer Science & Engineering Texas A&M University College Station, TX 77843, USA {mengliu,liuhr99,sji}@tamu.edu |
| Pseudocode | Yes | Algorithm 1 Ratio Matching with Gradient-Guided Importance Sampling (RMw GGIS) 1: Input: Observed dataset D = x(m) |D| m=1, parameterized energy function Eθ( ), number of samples s for Monte Carlo estimation with importance sampling 2: for x D do Batch training is applied in practice 3: Compute Eθ(x) 4: Compute x Eθ(x) 5: Compute the proposal distribution en (x i) Eq. (10) 6: Sample s terms, denoted as x(1) i , , x(s) i, according to en (x i) 7: Compute JRM(θ, x) b en Eq. (6) (or Eq. (11)) 8: Update θ based on θJRM(θ, x) b en 9: end for |
| Open Source Code | Yes | Our implementation is available at https://github.com/divelab/RMwGGIS. |
| Open Datasets | Yes | We further evaluate our RMw GGIS on graph generation using the Ego-small dataset (You et al., 2018). ... We firstly draw 2D data points from 2D continuous space according to some unknown distribution ˆp, which can be naturally visualized. Then, we convert each 2D data point ˆx R2 to a discrete data point x {0, 1}d... We follow the experimental setting of Dai et al. (2020) for density modeling on synthetic discrete data. |
| Dataset Splits | No | For graph generation, the paper states "80% of the graphs are used for training and the rest for testing." but does not explicitly mention a validation split. |
| Hardware Specification | No | The paper mentions general hardware like "modern GPUs with limited memory" but does not specify any exact GPU models, CPU models, or other detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility, only mentioning libraries or optimizers by name without version details. |
| Experiment Setup | Yes | The energy function is parameterized by an MLP with the Swish (Ramachandran et al., 2017) activation and 256 hidden dimensions. The number of samples s, involved in the objective functions of our RMw GGIS method, is set to be 10. ... We use Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e-4 and a batch size of 100. ℓ1 penalty with strength 0.01 is used to encourage sparsity. |