Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity
Authors: Lin Guan, Sarath Sreedharan, Subbarao Kambhampati
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our system by testing on three different benchmark domains and show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal. |
| Researcher Affiliation | Academia | 1School of Computing & AI, Arizona State University, Tempe, AZ. |
| Pseudocode | Yes | Algorithm 1 in Appendix provides the pseudo code for our learning method. |
| Open Source Code | Yes | 1Our source code is available at https://github.com/GuanSuns/ASGRL. |
| Open Datasets | No | The paper uses custom environments (Household, Mine Craft, Mario) and refers to the codebase for the Household environment. It does not provide access information or citations for specific datasets used for training within these environments. |
| Dataset Splits | No | The paper does not specify any training, validation, or test dataset splits (e.g., percentages or sample counts) for the environments used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions general software like 'RL' and 'Q-Learning' and refers to a codebase for the Household environment, but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, specific libraries). |
| Experiment Setup | Yes | To balance exploration and exploitation, we use ϵ-greedy in our approach and other baselines. Each skill policy maintains its own ϵ1 value. ϵ1 is annealed from 1.0 to 0.05 by a factor of 0.95 whenever the skill successfully reaches a skill terminal state. The meta-controller starts with ϵ2 = 1.0 and decreases it by a factor of 0.9 whenever the low-level skills reach the final goal state(s) until ϵ2 = 0.05. ... the learning rate of each skill policy is also annealed from 1.0 to 0.1 by a factor of 0.95 every time the skill reaches a landmark state. |