Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning

Authors: Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander T Toshev, Sergey Levine, brian ichter

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations for maze-solving and robotic manipulation tasks demonstrate that our approach improves long-horizon performance and enables better zero-shot generalization than alternative model-free and model-based methods.
Researcher Affiliation Collaboration γGoogle Research, Robotics @ Google βBerkeley AI Research, UC Berkeley
Pseudocode No The paper describes algorithms but does not provide formal pseudocode blocks or figures explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code No We plan to release more information about the proprietary environments and tasks used in a public release1 of this article at a later date.
Open Datasets Yes We use the versatile Mini Grid environment (Chevalier-Boisvert et al., 2018) in a fully observable setting, where the agent receives a top-down view of the environment.
Dataset Splits No The paper does not explicitly provide information on validation dataset splits, only training and test/generalization setups.
Hardware Specification No The paper does not specify any particular hardware used for running experiments (e.g., GPU/CPU models, cloud instances).
Software Dependencies No The paper refers to various algorithms and frameworks (e.g., DQN, DDQN, MT-Opt) but does not provide specific software version numbers for any libraries or dependencies.
Experiment Setup No The paper refers to Appendix A.1 and A.2 for 'further implementation details' and 'further details about these skills', implying that specific hyperparameter values and detailed training configurations are not fully present in the main text.