reproducibilityindex.ai

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

Authors: Yushi Cao, Zhiming Li, Tianpei Yang, Hao Zhang, YAN ZHENG, Yi Li, Jianye Hao, Yang Liu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of GALOIS, we study the following research questions (RQs): RQ1 (Performance): How effective is GALOIS regarding the performance and learning speed? RQ2 (Generalizability): How is the generalizability of GALOIS across environments? RQ3 (Reusability): Does GALOIS show great knowledge reusability across different environments?
Researcher Affiliation	Academia	1College of Intelligence and Computing, Tianjin university, Tianjin, China 2Nanyang Technological University, Singapore, 3University of Alberta, Canada
Pseudocode	No	The paper describes its methods using text and diagrams (e.g., Figure 2, Figure 3, Figure 4) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	3The implementation is available at: https://sites.google.com/view/galois-drl
Open Datasets	Yes	Environments: We adopt the Mini Grid environments [7], which contains various tasks that require different abilities (i.e., navigation and multistep logical reasoning) to accomplish. We consider four representative tasks with incremental levels of logical difficulties as shown in Figure 5. [7] Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
Dataset Splits	No	The paper mentions 'training environment' and 'test environments' but does not specify explicit training/validation/test dataset splits or their percentages, nor does it detail a cross-validation setup.
Hardware Specification	Yes	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix
Software Dependencies	No	The paper mentions various baseline DRL algorithms (DQN [23], PPO [31], SAC [13], h-DQN [20], MPPS [44]) and states 'To avoid unfair comparison, we use the same training settings for all methods (see Appendix B for more details)', but it does not specify any software names with version numbers for reproducibility.
Experiment Setup	Yes	Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix