reproducibilityindex.ai

Internally Rewarded Reinforcement Learning

Authors: Mengdi Li, Xufeng Zhao, Jae Hee Lee, Cornelius Weber, Stefan Wermter

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed reward function can consistently stabilize the training process by reducing the impact of reward noise, which leads to faster convergence and higher performance compared with baselines in diverse tasks.
Researcher Affiliation	Academia	1Knowledge Technology Group, Department of Informatics, University of Hamburg, Hamburg, Germany.
Pseudocode	No	The paper describes algorithms and methods but does not include formal pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code	Yes	Project page: https://ir-rl.github.io/
Open Datasets	Yes	We adopt the dataset conﬁguration of Mnih et al. (2014), and use two basic models for this task: the recurrent attention model (RAM) (Mnih et al., 2014) and the dynamic-time recurrent attention model (DT-RAM) (Li et al., 2017). ... We use the same experimental setup and basic model on the four-room environment as in the work of the discriminator disagreement intrinsic reward (DISDAIN) (Strouse et al., 2022). ... The setup is based on the task of object existence prediction (Li et al., 2021).
Dataset Splits	Yes	We generate 60k Cluttered MNIST images, of which 90% are used for training and the rest for validation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions software like Adam, PPO, and REINFORCE algorithms, and refers to code repositories for implementations, but does not provide specific version numbers for these software packages or any other dependencies (e.g., Python, PyTorch versions).
Experiment Setup	Yes	RAM models are trained using REINFORCE (Williams, 1992) and optimized by Adam (Kingma & Ba, 2015) for 1500 epochs with a batch size of 128 and a learning rate of 3e-4.