reproducibilityindex.ai

Scale-free adaptive planning for deterministic dynamics & discounted rewards

Authors: Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically illustrate the beneﬁts of Pla TγPOOS. We chose a simple MDP, shown in Figure 5. In this MDP, a state x (bin, d) is a pair of a binary variable bin and a non-negative integer d.
Researcher Affiliation	Collaboration	1University of California, Berkeley, USA 2Noah s Ark Lab, Huawei Technologies, London, UK 3Adobe Research, San Jose, USA 4Seque L team, INRIA Lille Nord Europe, France.
Pseudocode	Yes	Figure 1. Algorithm for free planning with no reset condition
Open Source Code	No	The paper does not provide any statement about open-source code availability or links to a code repository.
Open Datasets	No	The paper defines a simple MDP (Markov Decision Process) as the environment for experiments but does not provide access information for a public dataset or a generated dataset.
Dataset Splits	No	The paper describes a planning problem within a defined MDP environment and evaluates performance based on 'n interactions' with a generative model, rather than using traditional dataset splits like training, validation, and test sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers required to replicate the experiment.
Experiment Setup	Yes	We set γ = 0.95. Therefore, Rmax ≈ 130. ... The reward is then shifted by adding 100 to it so that the noises with different ranges can be added on top without making the reward negative.