reproducibilityindex.ai

Neural Logic Reinforcement Learning

Authors: Zhengyao Jiang, Shan Luo

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on cliff-walking and blocks manipulation tasks demonstrate that NLRL can induce interpretable policies achieving near-optimal performance while showing good generalisability to environments of different initial states and problem sizes.
Researcher Affiliation	Academia	1Department of Computer Science, University of Liverpool, Liverpool, United Kingdom. Correspondence to: Zhengyao Jiang <z.jiang22@student.liverpool.ac.uk>, Shan Luo <shan.luo@liverpool.ac.uk>.
Pseudocode	No	The paper describes the DRLM architecture and MDP with logic interpretation, but it does not include any labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Code available at the homepage of the paper: https://github.com/Zhengyao Jiang/NLRL
Open Datasets	No	The paper describes the setup for custom-defined simulated environments (block manipulation, cliff-walking) for training, rather than using or providing access to a static, publicly available dataset in the conventional sense (e.g., MNIST, ImageNet).
Dataset Splits	No	The paper describes “training environment” and “test environment” but does not explicitly mention or detail a separate “validation” split or set.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions software components like RMSProp and ReLU activation function, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	Similar to ILP, we use RMSProp to train the agent, whose learning rate is set as 0.001. The generalized advantages (λ = 0.95) are applied to the value network where the value is estimated by a neural network with one 20-units hidden layer. [...] we use the same rules templates for invented predicates across all the tasks, each with only 1 clause, i.e., (1, 1, True), (1, 2, False), (2, 1, True), (2, 1, True). The templates of action predicates vary in different tasks but it is easy to ﬁnd a good one by exhaustive search. For the UNSTACK and STACK tasks, the action predicate template is (2, 1, True). For the ON task, the action predicate templates are (2, 1, True) and (2, 0, True). There are four action predicates in the cliff-walking task, we give all these predicates the same template (3, 1, True).