Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems

Authors: Rushang Karia, Siddharth Srivastava

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects. Our empirical results show that our approach can outperform existing approaches for zero-shot transfer. We performed an empirical evaluation on four different tasks and our results show that GRL outperforms the baseline in zero-shot transfer performance.
Researcher Affiliation Academia Rushang Karia , Siddharth Srivastava School of Computing and Augmented Intelligence, Arizona State University, U.S.A. {Rushang.Karia, siddharths}@asu.edu
Pseudocode Yes Algorithm 1 Generalized Reinforcement Learning (GRL)
Open Source Code Yes Our code is available at: https://github.com/AAIR-lab/GHN
Open Datasets Yes We consider tasks used in the International Probablistic Planning Competition (IPPC) [Sanner, 2011; Sanner, 2014], some of which have been used by Sym Net and MBRRL as benchmarks for evaluating transfer performance. We used Sys Admin(n) with n {3, 4, 6}, Academic Advising(n, n, n) with n {2, 3, 4}, Game of Life(n, n) with n {2, 3} and Wildfire(n, n) with n {2, 3, 4} for generating problems used for training each domain respectively.
Dataset Splits No The paper mentions training and test problems and their state space sizes, but it does not provide specific percentages or counts for training, validation, and test splits, nor does it explicitly mention a validation set.
Hardware Specification Yes We ran our experiments utilizing a single core and 16 Gi B of memory on an Intel Xeon E5-2680 v4 CPU containing 28 cores and 128 Gi B of RAM.
Software Dependencies No Our system is implemented in Python2 and we used Py Torch [Paszke et al., 2019] with default implementations of mean squared error (MSE) as the loss function and Adam [Kingma and Ba, 2015] as the optimization algorithm for training each domain-specific QGRL network. Our system uses RDDLsim as the simulator. The specific version numbers for PyTorch, Adam, or RDDLsim are not provided.
Experiment Setup Yes Hyperparameters We used the IPPC horizon H of 40 time steps for each episode after which the simulator was reset to the initial state. To train a QGRL network, we used a replay buffer of size 20000, a mini-batch size of 32, and a training interval of 32 time steps with 25 steps of optimization per interval. For our test setup, we used Q-learning with ϵ = 0.1 for GRL and MBRRL. We used γ = 0.9 and α = 0.05 for the Sys Admin and Game of Life domains and used γ = 1.0 and α = 0.3 for Academic Advising and Wildfire.