Scale-free adaptive planning for deterministic dynamics & discounted rewards

Authors: Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically illustrate the benefits of Pla TγPOOS. We chose a simple MDP, shown in Figure 5. In this MDP, a state x (bin, d) is a pair of a binary variable bin and a non-negative integer d.
Researcher Affiliation Collaboration 1University of California, Berkeley, USA 2Noah s Ark Lab, Huawei Technologies, London, UK 3Adobe Research, San Jose, USA 4Seque L team, INRIA Lille Nord Europe, France.
Pseudocode Yes Figure 1. Algorithm for free planning with no reset condition
Open Source Code No The paper does not provide any statement about open-source code availability or links to a code repository.
Open Datasets No The paper defines a simple MDP (Markov Decision Process) as the environment for experiments but does not provide access information for a public dataset or a generated dataset.
Dataset Splits No The paper describes a planning problem within a defined MDP environment and evaluates performance based on 'n interactions' with a generative model, rather than using traditional dataset splits like training, validation, and test sets.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers required to replicate the experiment.
Experiment Setup Yes We set γ = 0.95. Therefore, Rmax ≈ 130. ... The reward is then shifted by adding 100 to it so that the noises with different ranges can be added on top without making the reward negative.