Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Multi-Agent Domain Calibration with a Handful of Offline Data
Authors: Tao Jiang, Lei Yuan, Lihe Li, Cong Guan, Zongzhang Zhang, Yang Yu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation on 21 offline locomotion tasks in D4RL and Neo RL benchmarks showcases the superior performance of our method compared to strong existing offline model-based RL, offline domain calibration, and hybrid offline-and-online RL baselines. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China 2School of Artificial Intelligence, Nanjing University, Nanjing, China 3Polixir Technologies, Nanjing, China |
| Pseudocode | Yes | The pseudo-code of Madoc is presented in Alg. 1, we utilize SAC [59] and DOP [49] as our backbone algorithms for domain calibration. |
| Open Source Code | Yes | The source code is available at https: //github.com/LAMDA-RL/Madoc. |
| Open Datasets | Yes | On the popular D4RL benchmark [60], we choose four locomotion tasks (Half Cheetah, Hopper, Walker2d, Ant), each with three types of datasets (medium, medium-replay, medium-expert), to evaluate different algorithms performance when faced with datasets of varying quality. Considering more challenging scenarios, three environments (Half Cheetah, Hopper, Walker2d) along with three levels of datasets (low, medium, high) from Neo RL benchmark [61] are also selected. |
| Dataset Splits | Yes | As illustrated in Fig. 4(a), the algorithms access datasets of different magnitudes, 5 104 (small), 2 105 (medium), and 1 106 (large), to reflect a spectrum of data availability. |
| Hardware Specification | Yes | Most experiments were conducted on a server outfitted with a 13th Gen Intel(R) Core(TM) i9-13900K CPU, 2 NVIDIA RTX A5000 GPUs, and 125GB of RAM, running Ubuntu 22.04. |
| Software Dependencies | No | The paper mentions โUbuntu 22.04โ as the operating system and refers to various algorithms and frameworks like SAC and DOP, but it does not specify version numbers for other key software libraries or dependencies (e.g., PyTorch, TensorFlow, NumPy). |
| Experiment Setup | Yes | We list the default hyper-parameter settings for Madoc in Tab. 5. |