reproducibilityindex.ai

Symbolic Network: Generalized Neural Policies for Relational MDPs

Authors: Sankalp Garg, Aniket Bajpai, Mausam

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on nine RDDL domains from IPPC demonstrate that SYMNET policies are signiﬁcantly better than random and sometimes even more effective than training a state-of-the-art deep reactive policy from scratch.We perform experiments on nine RDDL domains from IPPC 2014 (Grzes et al., 2014).
Researcher Affiliation	Academia	1Indian Institute of Technology Delhi. Correspondence to: Sankalp Garg <sankalp2621998@gmail.com>, Aniket Bajapi <quantum.computing96@gmail.com>, Mausam <mausam@cse.iitd.ac.in>.
Pseudocode	No	The paper describes the SYMNET framework and its learning process in detail within Section 4, but it does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We release the code of SYMNET for future research.2
Open Datasets	Yes	We perform experiments on nine RDDL domains from IPPC 2014 (Grzes et al., 2014).
Dataset Splits	No	The paper states it uses IPPC problem instances 1, 2, and 3 for multi-task training and instances 5-10 for testing. While it defines training and test sets, it does not explicitly describe a separate validation set, its size, or how it was used in the experimental setup.
Hardware Specification	Yes	We train the network using RMSProp (Ruder, 2016) on a single Nvidia K40 GPU.
Software Dependencies	No	The paper mentions using specific algorithms and activation functions like A3C (Mnih et al., 2016), RMSProp (Ruder, 2016), and Leaky ReLU (Xu et al., 2015), but it does not specify version numbers for general software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The embedding module for GAT uses a neighborhood of 1 and an output feature size of 6. We then use a fully connected layer of output 20 dimensions to get an embedding from each of the tuple embedding outputs by GAT. All layers use a leaky Re LU activation and a learning rate of 10 3. SYMNET is trained for each domain for twelve hours (4 hours for each instance).