GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis
Authors: Yushi Cao, Zhiming Li, Tianpei Yang, Hao Zhang, YAN ZHENG, Yi Li, Jianye Hao, Yang Liu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness of GALOIS, we study the following research questions (RQs): RQ1 (Performance): How effective is GALOIS regarding the performance and learning speed? RQ2 (Generalizability): How is the generalizability of GALOIS across environments? RQ3 (Reusability): Does GALOIS show great knowledge reusability across different environments? |
| Researcher Affiliation | Academia | 1College of Intelligence and Computing, Tianjin university, Tianjin, China 2Nanyang Technological University, Singapore, 3University of Alberta, Canada |
| Pseudocode | No | The paper describes its methods using text and diagrams (e.g., Figure 2, Figure 3, Figure 4) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | 3The implementation is available at: https://sites.google.com/view/galois-drl |
| Open Datasets | Yes | Environments: We adopt the Mini Grid environments [7], which contains various tasks that require different abilities (i.e., navigation and multistep logical reasoning) to accomplish. We consider four representative tasks with incremental levels of logical difficulties as shown in Figure 5. [7] Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018. |
| Dataset Splits | No | The paper mentions 'training environment' and 'test environments' but does not specify explicit training/validation/test dataset splits or their percentages, nor does it detail a cross-validation setup. |
| Hardware Specification | Yes | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix |
| Software Dependencies | No | The paper mentions various baseline DRL algorithms (DQN [23], PPO [31], SAC [13], h-DQN [20], MPPS [44]) and states 'To avoid unfair comparison, we use the same training settings for all methods (see Appendix B for more details)', but it does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix |