reproducibilityindex.ai

Probabilistic Planning with Sequential Monte Carlo methods

Authors: Alexandre Piche, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, Chris Pal

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS, Our results suggest that SMCP does not learn as fast as CEM and RS initially as it heavily relies on estimating a good value function. However, SMCP quickly achieves higher performances than CEM and RS. SMCP also learns faster than SAC because it was able to leverage information from the model early in training.
Researcher Affiliation	Collaboration	1 Mila, Universit e de Montr eal 2 Element AI 3 CIFAR Senior Fellow 4 Mila, Polytechnique Montr eal 5 Canada CIFAR AI Chair
Pseudocode	Yes	Algorithm 1 SMC Planning using SIR, Algorithm 2 SMC Planning using SIS
Open Source Code	No	We used a custom implementation with a Gaussian policy for both the SAC baseline and the proposal distribution used for both versions of SMCP. We used Adam (Kingma & Ba, 2014) with a learning rate of 0.001. The reward scaling suggested by Haarnoja et al. (2018) for all experiments and used an implementation inspired by Pong (2018). We used a two hidden layers with 256 hidden units for the three networks: the value function, the policy and the soft Q functions. and Vitchyr Pong. rlkit. https://github.com/vitchyr/rlkit/, 2018. The paper does not state that their code for SMCP is available.
Open Datasets	Yes	The experiments were conducted on the Open AI Gym Mujoco benchmark suite (Brockman et al., 2016; Todorov et al., 2012).
Dataset Splits	No	No explicit training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) were provided. The paper describes a continuous training process within an RL environment rather than static dataset splits.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided.
Software Dependencies	No	The paper mentions 'Adam (Kingma & Ba, 2014)' and 'rlkit (Pong, 2018)' but does not provide specific version numbers for software dependencies such as Python, deep learning frameworks, or other libraries crucial for replication.
Experiment Setup	Yes	We used Adam (Kingma & Ba, 2014) with a learning rate of 0.001... We used a two hidden layers with 256 hidden units for the three networks: the value function, the policy and the soft Q functions., Table A.3: Hyperparameters for the experiments. Hopper-v2 (Temperature 1, Horizon length 10, Number of Dense Layers 3, Layer Dimension 256), Walker2d-v2 (Temperature 10, Horizon length 20, Number of Dense Layers 3, Layer Dimension 256), Half Cheetah-v2 (Temperature 10, Horizon length 20, Number of Dense Layers 3, Layer Dimension 256).