Multi-Agent Domain Calibration with a Handful of Offline Data

Authors: Tao Jiang, Lei Yuan, Lihe Li, Cong Guan, Zongzhang Zhang, Yang Yu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation on 21 offline locomotion tasks in D4RL and Neo RL benchmarks showcases the superior performance of our method compared to strong existing offline model-based RL, offline domain calibration, and hybrid offline-and-online RL baselines.
Researcher Affiliation Collaboration 1National Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China 2School of Artificial Intelligence, Nanjing University, Nanjing, China 3Polixir Technologies, Nanjing, China
Pseudocode Yes The pseudo-code of Madoc is presented in Alg. 1, we utilize SAC [59] and DOP [49] as our backbone algorithms for domain calibration.
Open Source Code Yes The source code is available at https: //github.com/LAMDA-RL/Madoc.
Open Datasets Yes On the popular D4RL benchmark [60], we choose four locomotion tasks (Half Cheetah, Hopper, Walker2d, Ant), each with three types of datasets (medium, medium-replay, medium-expert), to evaluate different algorithms performance when faced with datasets of varying quality. Considering more challenging scenarios, three environments (Half Cheetah, Hopper, Walker2d) along with three levels of datasets (low, medium, high) from Neo RL benchmark [61] are also selected.
Dataset Splits Yes As illustrated in Fig. 4(a), the algorithms access datasets of different magnitudes, 5 104 (small), 2 105 (medium), and 1 106 (large), to reflect a spectrum of data availability.
Hardware Specification Yes Most experiments were conducted on a server outfitted with a 13th Gen Intel(R) Core(TM) i9-13900K CPU, 2 NVIDIA RTX A5000 GPUs, and 125GB of RAM, running Ubuntu 22.04.
Software Dependencies No The paper mentions “Ubuntu 22.04” as the operating system and refers to various algorithms and frameworks like SAC and DOP, but it does not specify version numbers for other key software libraries or dependencies (e.g., PyTorch, TensorFlow, NumPy).
Experiment Setup Yes We list the default hyper-parameter settings for Madoc in Tab. 5.