Probabilistic Planning with Sequential Monte Carlo methods

Authors: Alexandre Piche, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, Chris Pal

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS, Our results suggest that SMCP does not learn as fast as CEM and RS initially as it heavily relies on estimating a good value function. However, SMCP quickly achieves higher performances than CEM and RS. SMCP also learns faster than SAC because it was able to leverage information from the model early in training.
Researcher Affiliation Collaboration 1 Mila, Universit e de Montr eal 2 Element AI 3 CIFAR Senior Fellow 4 Mila, Polytechnique Montr eal 5 Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 SMC Planning using SIR, Algorithm 2 SMC Planning using SIS
Open Source Code No We used a custom implementation with a Gaussian policy for both the SAC baseline and the proposal distribution used for both versions of SMCP. We used Adam (Kingma & Ba, 2014) with a learning rate of 0.001. The reward scaling suggested by Haarnoja et al. (2018) for all experiments and used an implementation inspired by Pong (2018). We used a two hidden layers with 256 hidden units for the three networks: the value function, the policy and the soft Q functions. and Vitchyr Pong. rlkit. https://github.com/vitchyr/rlkit/, 2018. The paper does not state that their code for SMCP is available.
Open Datasets Yes The experiments were conducted on the Open AI Gym Mujoco benchmark suite (Brockman et al., 2016; Todorov et al., 2012).
Dataset Splits No No explicit training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) were provided. The paper describes a continuous training process within an RL environment rather than static dataset splits.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided.
Software Dependencies No The paper mentions 'Adam (Kingma & Ba, 2014)' and 'rlkit (Pong, 2018)' but does not provide specific version numbers for software dependencies such as Python, deep learning frameworks, or other libraries crucial for replication.
Experiment Setup Yes We used Adam (Kingma & Ba, 2014) with a learning rate of 0.001... We used a two hidden layers with 256 hidden units for the three networks: the value function, the policy and the soft Q functions., Table A.3: Hyperparameters for the experiments. Hopper-v2 (Temperature 1, Horizon length 10, Number of Dense Layers 3, Layer Dimension 256), Walker2d-v2 (Temperature 10, Horizon length 20, Number of Dense Layers 3, Layer Dimension 256), Half Cheetah-v2 (Temperature 10, Horizon length 20, Number of Dense Layers 3, Layer Dimension 256).