Symbolic Network: Generalized Neural Policies for Relational MDPs
Authors: Sankalp Garg, Aniket Bajpai, Mausam
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on nine RDDL domains from IPPC demonstrate that SYMNET policies are significantly better than random and sometimes even more effective than training a state-of-the-art deep reactive policy from scratch.We perform experiments on nine RDDL domains from IPPC 2014 (Grzes et al., 2014). |
| Researcher Affiliation | Academia | 1Indian Institute of Technology Delhi. Correspondence to: Sankalp Garg <sankalp2621998@gmail.com>, Aniket Bajapi <quantum.computing96@gmail.com>, Mausam <mausam@cse.iitd.ac.in>. |
| Pseudocode | No | The paper describes the SYMNET framework and its learning process in detail within Section 4, but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We release the code of SYMNET for future research.2 |
| Open Datasets | Yes | We perform experiments on nine RDDL domains from IPPC 2014 (Grzes et al., 2014). |
| Dataset Splits | No | The paper states it uses IPPC problem instances 1, 2, and 3 for multi-task training and instances 5-10 for testing. While it defines training and test sets, it does not explicitly describe a separate validation set, its size, or how it was used in the experimental setup. |
| Hardware Specification | Yes | We train the network using RMSProp (Ruder, 2016) on a single Nvidia K40 GPU. |
| Software Dependencies | No | The paper mentions using specific algorithms and activation functions like A3C (Mnih et al., 2016), RMSProp (Ruder, 2016), and Leaky ReLU (Xu et al., 2015), but it does not specify version numbers for general software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The embedding module for GAT uses a neighborhood of 1 and an output feature size of 6. We then use a fully connected layer of output 20 dimensions to get an embedding from each of the tuple embedding outputs by GAT. All layers use a leaky Re LU activation and a learning rate of 10 3. SYMNET is trained for each domain for twelve hours (4 hours for each instance). |