reproducibilityindex.ai

Soft Q-Learning with Mutual-Information Regularization

Authors: Jordi Grau-Moya, Felix Leibfried, Peter Vrancx

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our MIRL agent both in the tabular setting using a grid world domain, and in the parametric function approximator setting using the Atari domain.
Researcher Affiliation	Industry	Jordi Grau-Moya, Felix Leibfried and Peter Vrancx PROWLER.io Cambridge, United Kingdom {jordi}@prowler.io
Pseudocode	Yes	The pseudocode of our proposed algorithm is outlined in Algorithm 1
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We conduct experiments on 19 Atari games (Brockman et al., 2016)
Dataset Splits	No	The paper describes training and evaluation/testing procedures but does not explicitly mention distinct 'validation dataset splits' as a separate data partitioning from the main text.
Hardware Specification	No	The paper mentions using a neural network and running experiments on Atari, but it does not specify any particular hardware details such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming language versions or library versions (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	Parameter β Updates: The parameter β can be seen as a Lagrange multiplier that quantifies the magnitude of penalization for deviating from the prior. As such, a small fixed value of β would restrict the class of available policies and evidently constrain the asymptotic performance of MIRL. In order to remedy this problem and obtain better asymptotic performance, we use the same adaptive β-scheduling over rounds i from (Fox et al., 2016) in which βi is updated linearly according to βi+1 = c * i with some positive constant c.