Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment
Authors: Michael Chang, Sid Kaushik, Sergey Levine, Tom Griffiths
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence suggests that such action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Berkeley, USA 2Department of Computer Science, Princeton University, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a link for open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating custom transfer problems (“An Enumeration of Transfer Problems”, “Figure 4: How transfer tasks are generated”) but does not provide concrete access information (link, DOI, specific citation with authors/year) for a publicly available dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions “computing resources from Amazon Web Services and Microsoft Azure” but does not provide specific hardware details such as GPU/CPU models or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We set the convergence time as the first time after which the return deviates by no more than ε = 0.01 from the optimal return, 0.8, for 30 epochs of training. Shown are runs across ten seeds. States are represented as binary vectors. The reward is given at the end of the episode and is 1 if the task is solved and 0 otherwise. |