Symbolic Network: Generalized Neural Policies for Relational MDPs

Authors: Sankalp Garg, Aniket Bajpai, Mausam

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on nine RDDL domains from IPPC demonstrate that SYMNET policies are significantly better than random and sometimes even more effective than training a state-of-the-art deep reactive policy from scratch.We perform experiments on nine RDDL domains from IPPC 2014 (Grzes et al., 2014).
Researcher Affiliation Academia 1Indian Institute of Technology Delhi. Correspondence to: Sankalp Garg <sankalp2621998@gmail.com>, Aniket Bajapi <quantum.computing96@gmail.com>, Mausam <mausam@cse.iitd.ac.in>.
Pseudocode No The paper describes the SYMNET framework and its learning process in detail within Section 4, but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes We release the code of SYMNET for future research.2
Open Datasets Yes We perform experiments on nine RDDL domains from IPPC 2014 (Grzes et al., 2014).
Dataset Splits No The paper states it uses IPPC problem instances 1, 2, and 3 for multi-task training and instances 5-10 for testing. While it defines training and test sets, it does not explicitly describe a separate validation set, its size, or how it was used in the experimental setup.
Hardware Specification Yes We train the network using RMSProp (Ruder, 2016) on a single Nvidia K40 GPU.
Software Dependencies No The paper mentions using specific algorithms and activation functions like A3C (Mnih et al., 2016), RMSProp (Ruder, 2016), and Leaky ReLU (Xu et al., 2015), but it does not specify version numbers for general software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The embedding module for GAT uses a neighborhood of 1 and an output feature size of 6. We then use a fully connected layer of output 20 dimensions to get an embedding from each of the tuple embedding outputs by GAT. All layers use a leaky Re LU activation and a learning rate of 10 3. SYMNET is trained for each domain for twelve hours (4 hours for each instance).