reproducibilityindex.ai

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Authors: Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy Zhang, Ricky T. Q. Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training. 5 EXPERIMENTS We evaluate SMAC on a series of diverse continuous control tasks from Deep Mind Control Suite (DMC; Tassa et al. (2018)).
Researcher Affiliation	Collaboration	Dinghuai Zhang , Aaron Courville, Yoshua Bengio Mila, University de Montreal Qinqing Zheng, Amy Zhang, Ricky T. Q. Chen Meta AI (FAIR)
Pseudocode	Yes	Algorithm 1 SMAC (without a world model) and Algorithm 2 SMAC (with a world model)
Open Source Code	Yes	Our implementation is open sourced at https://github.com/zdh Narsil/ Stochastic-Marginal-Actor-Critic.
Open Datasets	Yes	We evaluate SMAC on a series of diverse continuous control tasks from Deep Mind Control Suite (DMC; Tassa et al. (2018)).
Dataset Splits	No	The paper mentions a 'replay buffer D' for training and sampling states, but does not provide specific train/validation/test dataset splits, percentages, or counts for its experiments.
Hardware Specification	Yes	Tested with an NVIDIA Quadro GV100 on the pixel-based environments, our SMAC implementation does 60 frames per second (FPS) on average
Software Dependencies	No	The paper mentions using 'Py Torch' implementations for SAC and world models, but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We set the neural network width of the baselines and SMAC to 400 and 256 respectively to keep comparable number of parameters. For the entropy coefficients, we use the same autotuning approach from SAC (Haarnoja et al., 2018b). ... instead we set its learning rate to 3 10 4, which is empirically much better and also consistent with two other algorithms. ... We choose the best hyperparameters (number of particles in {8, 16, 32}, dimension of the latent in {8, 16, 32}) for each environment.