Scale-free adaptive planning for deterministic dynamics & discounted rewards
Authors: Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically illustrate the benefits of Pla TγPOOS. We chose a simple MDP, shown in Figure 5. In this MDP, a state x (bin, d) is a pair of a binary variable bin and a non-negative integer d. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley, USA 2Noah s Ark Lab, Huawei Technologies, London, UK 3Adobe Research, San Jose, USA 4Seque L team, INRIA Lille Nord Europe, France. |
| Pseudocode | Yes | Figure 1. Algorithm for free planning with no reset condition |
| Open Source Code | No | The paper does not provide any statement about open-source code availability or links to a code repository. |
| Open Datasets | No | The paper defines a simple MDP (Markov Decision Process) as the environment for experiments but does not provide access information for a public dataset or a generated dataset. |
| Dataset Splits | No | The paper describes a planning problem within a defined MDP environment and evaluates performance based on 'n interactions' with a generative model, rather than using traditional dataset splits like training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers required to replicate the experiment. |
| Experiment Setup | Yes | We set γ = 0.95. Therefore, Rmax ≈ 130. ... The reward is then shifted by adding 100 to it so that the noises with different ranges can be added on top without making the reward negative. |