Adversarial Counterfactual Environment Model Learning

Authors: Xiong-Hui Chen, Yang Yu, Zhengmao Zhu, ZhiHua Yu, Chen Zhenjun, Chenghe Wang, Yinan Wu, Rong-Jun Qin, Hongqiu Wu, Ruijin Ding, Huang Fangsheng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted in two synthetic tasks, three continuous-control tasks, and a real-world application. We first verify that GALILEO can make accurate predictions on counterfactual data queried by other policies compared with baselines.
Researcher Affiliation Collaboration 1 National Key Laboratory for Novel Software Technology, Nanjing University 2 School of Artificial Intelligence, Nanjing University, 3 Meituan, 4 Polixir.ai, 5 Tsinghua University
Pseudocode Yes Algorithm 1 Pseudocode for GALILEO
Open Source Code Yes 3 code https://github.com/xionghuichen/galileo.
Open Datasets Yes We select 3 Mu Jo Co environments from D4RL [17] to construct our model learning tasks. The Cancer Genomic Atlas (TCGA) is a project that has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein, and epigenetic levels.
Dataset Splits Yes In D4RL benchmark, only the medium tasks is collected with a fixed policy... So we train models in datasets Half Cheetah-medium, Walker2d-medium, and Hopper-medium. A Real-world Large-scale Food-delivery Platform We finally deploy GALILEO in a real-world large-scale food-delivery platform.
Hardware Specification Yes We use one Tesla V100 PCIe 32GB GPU and a 32-core Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz to train all of our models.
Software Dependencies No The paper mentions optimization algorithms like TRPO and PPO, but does not provide specific software dependencies with version numbers (e.g., library names and their versions like PyTorch 1.9, scikit-learn 0.24).
Experiment Setup Yes Table 6: Table of hyper-parameters for all of the tasks. This includes specific values for hidden layers, hidden units, batch size, learning rate, and other training parameters.