Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

Authors: Michael Chang, Sid Kaushik, Sergey Levine, Tom Griffiths

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence suggests that such action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Berkeley, USA 2Department of Computer Science, Princeton University, USA.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or a link for open-source code for the described methodology.
Open Datasets No The paper describes generating custom transfer problems (“An Enumeration of Transfer Problems”, “Figure 4: How transfer tasks are generated”) but does not provide concrete access information (link, DOI, specific citation with authors/year) for a publicly available dataset.
Dataset Splits No The paper does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) for training, validation, or testing.
Hardware Specification No The paper mentions “computing resources from Amazon Web Services and Microsoft Azure” but does not provide specific hardware details such as GPU/CPU models or detailed computer specifications used for running experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We set the convergence time as the first time after which the return deviates by no more than ε = 0.01 from the optimal return, 0.8, for 30 epochs of training. Shown are runs across ten seeds. States are represented as binary vectors. The reward is given at the end of the episode and is 1 if the task is solved and 0 otherwise.