Probabilistic Planning with Sequential Monte Carlo methods
Authors: Alexandre Piche, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, Chris Pal
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS, Our results suggest that SMCP does not learn as fast as CEM and RS initially as it heavily relies on estimating a good value function. However, SMCP quickly achieves higher performances than CEM and RS. SMCP also learns faster than SAC because it was able to leverage information from the model early in training. |
| Researcher Affiliation | Collaboration | 1 Mila, Universit e de Montr eal 2 Element AI 3 CIFAR Senior Fellow 4 Mila, Polytechnique Montr eal 5 Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 SMC Planning using SIR, Algorithm 2 SMC Planning using SIS |
| Open Source Code | No | We used a custom implementation with a Gaussian policy for both the SAC baseline and the proposal distribution used for both versions of SMCP. We used Adam (Kingma & Ba, 2014) with a learning rate of 0.001. The reward scaling suggested by Haarnoja et al. (2018) for all experiments and used an implementation inspired by Pong (2018). We used a two hidden layers with 256 hidden units for the three networks: the value function, the policy and the soft Q functions. and Vitchyr Pong. rlkit. https://github.com/vitchyr/rlkit/, 2018. The paper does not state that their code for SMCP is available. |
| Open Datasets | Yes | The experiments were conducted on the Open AI Gym Mujoco benchmark suite (Brockman et al., 2016; Todorov et al., 2012). |
| Dataset Splits | No | No explicit training/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) were provided. The paper describes a continuous training process within an RL environment rather than static dataset splits. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions 'Adam (Kingma & Ba, 2014)' and 'rlkit (Pong, 2018)' but does not provide specific version numbers for software dependencies such as Python, deep learning frameworks, or other libraries crucial for replication. |
| Experiment Setup | Yes | We used Adam (Kingma & Ba, 2014) with a learning rate of 0.001... We used a two hidden layers with 256 hidden units for the three networks: the value function, the policy and the soft Q functions., Table A.3: Hyperparameters for the experiments. Hopper-v2 (Temperature 1, Horizon length 10, Number of Dense Layers 3, Layer Dimension 256), Walker2d-v2 (Temperature 10, Horizon length 20, Number of Dense Layers 3, Layer Dimension 256), Half Cheetah-v2 (Temperature 10, Horizon length 20, Number of Dense Layers 3, Layer Dimension 256). |