Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems
Authors: Rushang Karia, Siddharth Srivastava
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects. Our empirical results show that our approach can outperform existing approaches for zero-shot transfer. We performed an empirical evaluation on four different tasks and our results show that GRL outperforms the baseline in zero-shot transfer performance. |
| Researcher Affiliation | Academia | Rushang Karia , Siddharth Srivastava School of Computing and Augmented Intelligence, Arizona State University, U.S.A. {Rushang.Karia, siddharths}@asu.edu |
| Pseudocode | Yes | Algorithm 1 Generalized Reinforcement Learning (GRL) |
| Open Source Code | Yes | Our code is available at: https://github.com/AAIR-lab/GHN |
| Open Datasets | Yes | We consider tasks used in the International Probablistic Planning Competition (IPPC) [Sanner, 2011; Sanner, 2014], some of which have been used by Sym Net and MBRRL as benchmarks for evaluating transfer performance. We used Sys Admin(n) with n {3, 4, 6}, Academic Advising(n, n, n) with n {2, 3, 4}, Game of Life(n, n) with n {2, 3} and Wildfire(n, n) with n {2, 3, 4} for generating problems used for training each domain respectively. |
| Dataset Splits | No | The paper mentions training and test problems and their state space sizes, but it does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly mention a validation set. |
| Hardware Specification | Yes | We ran our experiments utilizing a single core and 16 Gi B of memory on an Intel Xeon E5-2680 v4 CPU containing 28 cores and 128 Gi B of RAM. |
| Software Dependencies | No | Our system is implemented in Python2 and we used Py Torch [Paszke et al., 2019] with default implementations of mean squared error (MSE) as the loss function and Adam [Kingma and Ba, 2015] as the optimization algorithm for training each domain-specific QGRL network. Our system uses RDDLsim as the simulator. The specific version numbers for PyTorch, Adam, or RDDLsim are not provided. |
| Experiment Setup | Yes | Hyperparameters We used the IPPC horizon H of 40 time steps for each episode after which the simulator was reset to the initial state. To train a QGRL network, we used a replay buffer of size 20000, a mini-batch size of 32, and a training interval of 32 time steps with 25 steps of optimization per interval. For our test setup, we used Q-learning with ϵ = 0.1 for GRL and MBRRL. We used γ = 0.9 and α = 0.05 for the Sys Admin and Game of Life domains and used γ = 1.0 and α = 0.3 for Academic Advising and Wildfire. |