Neural Logic Reinforcement Learning
Authors: Zhengyao Jiang, Shan Luo
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on cliff-walking and blocks manipulation tasks demonstrate that NLRL can induce interpretable policies achieving near-optimal performance while showing good generalisability to environments of different initial states and problem sizes. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Liverpool, Liverpool, United Kingdom. Correspondence to: Zhengyao Jiang <z.jiang22@student.liverpool.ac.uk>, Shan Luo <shan.luo@liverpool.ac.uk>. |
| Pseudocode | No | The paper describes the DRLM architecture and MDP with logic interpretation, but it does not include any labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code available at the homepage of the paper: https://github.com/Zhengyao Jiang/NLRL |
| Open Datasets | No | The paper describes the setup for custom-defined simulated environments (block manipulation, cliff-walking) for training, rather than using or providing access to a static, publicly available dataset in the conventional sense (e.g., MNIST, ImageNet). |
| Dataset Splits | No | The paper describes “training environment” and “test environment” but does not explicitly mention or detail a separate “validation” split or set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software components like RMSProp and ReLU activation function, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Similar to ILP, we use RMSProp to train the agent, whose learning rate is set as 0.001. The generalized advantages (λ = 0.95) are applied to the value network where the value is estimated by a neural network with one 20-units hidden layer. [...] we use the same rules templates for invented predicates across all the tasks, each with only 1 clause, i.e., (1, 1, True), (1, 2, False), (2, 1, True), (2, 1, True). The templates of action predicates vary in different tasks but it is easy to find a good one by exhaustive search. For the UNSTACK and STACK tasks, the action predicate template is (2, 1, True). For the ON task, the action predicate templates are (2, 1, True) and (2, 0, True). There are four action predicates in the cliff-walking task, we give all these predicates the same template (3, 1, True). |