Multi-Agent Domain Calibration with a Handful of Offline Data
Authors: Tao Jiang, Lei Yuan, Lihe Li, Cong Guan, Zongzhang Zhang, Yang Yu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation on 21 offline locomotion tasks in D4RL and Neo RL benchmarks showcases the superior performance of our method compared to strong existing offline model-based RL, offline domain calibration, and hybrid offline-and-online RL baselines. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China 2School of Artificial Intelligence, Nanjing University, Nanjing, China 3Polixir Technologies, Nanjing, China |
| Pseudocode | Yes | The pseudo-code of Madoc is presented in Alg. 1, we utilize SAC [59] and DOP [49] as our backbone algorithms for domain calibration. |
| Open Source Code | Yes | The source code is available at https: //github.com/LAMDA-RL/Madoc. |
| Open Datasets | Yes | On the popular D4RL benchmark [60], we choose four locomotion tasks (Half Cheetah, Hopper, Walker2d, Ant), each with three types of datasets (medium, medium-replay, medium-expert), to evaluate different algorithms performance when faced with datasets of varying quality. Considering more challenging scenarios, three environments (Half Cheetah, Hopper, Walker2d) along with three levels of datasets (low, medium, high) from Neo RL benchmark [61] are also selected. |
| Dataset Splits | Yes | As illustrated in Fig. 4(a), the algorithms access datasets of different magnitudes, 5 104 (small), 2 105 (medium), and 1 106 (large), to reflect a spectrum of data availability. |
| Hardware Specification | Yes | Most experiments were conducted on a server outfitted with a 13th Gen Intel(R) Core(TM) i9-13900K CPU, 2 NVIDIA RTX A5000 GPUs, and 125GB of RAM, running Ubuntu 22.04. |
| Software Dependencies | No | The paper mentions “Ubuntu 22.04” as the operating system and refers to various algorithms and frameworks like SAC and DOP, but it does not specify version numbers for other key software libraries or dependencies (e.g., PyTorch, TensorFlow, NumPy). |
| Experiment Setup | Yes | We list the default hyper-parameter settings for Madoc in Tab. 5. |