reproducibilityindex.ai

Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity

Authors: Lin Guan, Sarath Sreedharan, Subbarao Kambhampati

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our system by testing on three different benchmark domains and show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal.
Researcher Affiliation	Academia	1School of Computing & AI, Arizona State University, Tempe, AZ.
Pseudocode	Yes	Algorithm 1 in Appendix provides the pseudo code for our learning method.
Open Source Code	Yes	1Our source code is available at https://github.com/GuanSuns/ASGRL.
Open Datasets	No	The paper uses custom environments (Household, Mine Craft, Mario) and refers to the codebase for the Household environment. It does not provide access information or citations for specific datasets used for training within these environments.
Dataset Splits	No	The paper does not specify any training, validation, or test dataset splits (e.g., percentages or sample counts) for the environments used.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions general software like 'RL' and 'Q-Learning' and refers to a codebase for the Household environment, but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, specific libraries).
Experiment Setup	Yes	To balance exploration and exploitation, we use ϵ-greedy in our approach and other baselines. Each skill policy maintains its own ϵ1 value. ϵ1 is annealed from 1.0 to 0.05 by a factor of 0.95 whenever the skill successfully reaches a skill terminal state. The meta-controller starts with ϵ2 = 1.0 and decreases it by a factor of 0.9 whenever the low-level skills reach the final goal state(s) until ϵ2 = 0.05. ... the learning rate of each skill policy is also annealed from 1.0 to 0.1 by a factor of 0.95 every time the skill reaches a landmark state.