Adversarial Counterfactual Environment Model Learning
Authors: Xiong-Hui Chen, Yang Yu, Zhengmao Zhu, ZhiHua Yu, Chen Zhenjun, Chenghe Wang, Yinan Wu, Rong-Jun Qin, Hongqiu Wu, Ruijin Ding, Huang Fangsheng
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted in two synthetic tasks, three continuous-control tasks, and a real-world application. We first verify that GALILEO can make accurate predictions on counterfactual data queried by other policies compared with baselines. |
| Researcher Affiliation | Collaboration | 1 National Key Laboratory for Novel Software Technology, Nanjing University 2 School of Artificial Intelligence, Nanjing University, 3 Meituan, 4 Polixir.ai, 5 Tsinghua University |
| Pseudocode | Yes | Algorithm 1 Pseudocode for GALILEO |
| Open Source Code | Yes | 3 code https://github.com/xionghuichen/galileo. |
| Open Datasets | Yes | We select 3 Mu Jo Co environments from D4RL [17] to construct our model learning tasks. The Cancer Genomic Atlas (TCGA) is a project that has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein, and epigenetic levels. |
| Dataset Splits | Yes | In D4RL benchmark, only the medium tasks is collected with a fixed policy... So we train models in datasets Half Cheetah-medium, Walker2d-medium, and Hopper-medium. A Real-world Large-scale Food-delivery Platform We finally deploy GALILEO in a real-world large-scale food-delivery platform. |
| Hardware Specification | Yes | We use one Tesla V100 PCIe 32GB GPU and a 32-core Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz to train all of our models. |
| Software Dependencies | No | The paper mentions optimization algorithms like TRPO and PPO, but does not provide specific software dependencies with version numbers (e.g., library names and their versions like PyTorch 1.9, scikit-learn 0.24). |
| Experiment Setup | Yes | Table 6: Table of hyper-parameters for all of the tasks. This includes specific values for hidden layers, hidden units, batch size, learning rate, and other training parameters. |