Variational Intrinsic Control Revisited

Authors: Taehwan Kwon

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We substantiate our claims through rigorous mathematical derivations and experimental analyses. 5 EXPERIMENTS In this section, we evaluate implicit VIC (Gregor et al., 2016), Algorithm 1 and Algorithm 2. We use LSTM (Hochreiter & Schmidhuber, 1997) to encode τt = (s0, a0, ..., st) into a vector. We conduct experiments on both deterministic and stochastic environments and evaluate results by measuring the mutual information I from samples.
Researcher Affiliation Industry Taehwan Kwon NC kth315@ncsoft.com
Pseudocode Yes Algorithm 1 Implicit VIC with transitional probability model Algorithm 2 Implicit VIC with Gaussian mixture model
Open Source Code No The paper does not provide any statements about releasing source code or links to a code repository.
Open Datasets No The paper describes custom '1D world', '2D world', 'tree world', and 'grid world' environments, as well as 'Half Cheetah-v3 in the Mujoco environments'. These are simulated environments or custom-built scenarios, not publicly available datasets with specific access information, citations, or repositories.
Dataset Splits No The paper does not explicitly specify training, validation, and test dataset splits with percentages, counts, or references to predefined splits for reproducibility. It discusses training phases and how data is generated from environments.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments.
Software Dependencies No The paper mentions 'LSTM' as an architecture and 'Mujoco' as an environment, but does not specify software dependencies with version numbers (e.g., specific deep learning frameworks like PyTorch or TensorFlow, or Mujoco version).
Experiment Setup Yes Please see Appendix F.1 for details on the hyper-parameter settings. Table 1: Hyper-parameters used for experiments. Optimizer Adam, Learning rate 1e-3, 1e-4, Betas (0.9, 0.999), Weight initialization Gaussian with std. 0.1 and mean 0, Batch size 128, Tsmooth 128, σ (GMM) 0.25, ngmm (GMM) 10.