reproducibilityindex.ai

Learning from a Learner

Authors: Alexis Jacq, Matthieu Geist, Ana Paiva, Olivier Pietquin

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the genericity of our method by observing agents implementing various reinforcement learning algorithms. Finally, we show that, on both discrete and continuous state/action tasks, the observer s performance (that optimizes the recovered reward) can surpass those of the observed learner.
Researcher Affiliation	Collaboration	1Google Brain, Paris, France 2INESC-ID, IST, University of Lisbon.
Pseudocode	Yes	Algorithm 1 Recovering trajectory-consistent reward
Open Source Code	No	The paper does not provide a specific link or explicit statement about the release of its source code.
Open Datasets	Yes	To evaluate how our approach holds when dealing with large dimensions, we use the same experimental setting on continuous control tasks taken from the Open AI gym benchmark suite (Brockman et al., 2016).
Dataset Splits	No	The paper describes training procedures and parameters, but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions software components and algorithms like 'Open AI gym', 'PPO', and 'Adam gradient descent' but does not specify their version numbers for reproducibility.
Experiment Setup	Yes	We use a discount factor γ = 0.96 and a trade-off factor α = 0.3. ... we use Adam gradient descent (Kingma & Ba, 2014) with learning rate 1e 3. ... The algorithm is run by modelling SPI with αmodel = 0.7. We use 1000 steps for the policy regressions, 100 steps for the KL divergence regressions, 3000 steps for the reward initialization and 1000 steps for the reward consistency regression.