reproducibilityindex.ai

Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

Authors: Michael Chang, Sid Kaushik, Sergey Levine, Tom Griffiths

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evidence suggests that such action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
Researcher Affiliation	Academia	1Department of Computer Science, University of California, Berkeley, USA 2Department of Computer Science, Princeton University, USA.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or a link for open-source code for the described methodology.
Open Datasets	No	The paper describes generating custom transfer problems (“An Enumeration of Transfer Problems”, “Figure 4: How transfer tasks are generated”) but does not provide concrete access information (link, DOI, specific citation with authors/year) for a publicly available dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) for training, validation, or testing.
Hardware Specification	No	The paper mentions “computing resources from Amazon Web Services and Microsoft Azure” but does not provide specific hardware details such as GPU/CPU models or detailed computer specifications used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We set the convergence time as the first time after which the return deviates by no more than ε = 0.01 from the optimal return, 0.8, for 30 epochs of training. Shown are runs across ten seeds. States are represented as binary vectors. The reward is given at the end of the episode and is 1 if the task is solved and 0 otherwise.