Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning

Authors: Jiechuan Jiang, Zongqing Lu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, OTC outperforms baselines in a variety of tasks. We construct decentralized datasets from a variety of tasks, including D4RL (Fu et al. 2020), MPE (Lowe et al. 2017), and SMAC (Samvelyan et al. 2019). Empirically, OTC outperforms baselines, and ablation studies demonstrate the effectiveness of the two distance measures, the practicability of rank-based prioritization, and the improvement of adaptive prioritization. Experiments The experiments could be divided into two parts. First, we fully verify the effectiveness of each module including distance metrics dq and de, on standard benchmarks D4RL (Fu et al. 2020) and on the two representative base algorithms: BCQ (Fujimoto, Meger, and Precup 2019), and AWAC (Nair et al. 2020) which is an important offline-to-online baseline. Second, we verify OTC in more diverse settings, including different tasks (SMAC (Samvelyan et al. 2019), MPE (Lowe et al. 2017)) and more base algorithms (BREMEN (Matsushima et al. 2021), CQL (Kumar et al. 2020), and TD3-BC (Fujimoto and Gu 2021)), so we only choose one distance.
Researcher Affiliation Academia Jiechuan Jiang, Zongqing Lu School of Computer Science, Peking University {jiechuan.jiang, zongqing.lu}@pku.edu.cn
Pseudocode Yes Algorithm 1: OTC for Agent i
Open Source Code No The paper does not provide any explicit statements or links indicating that its source code is available.
Open Datasets Yes We construct decentralized datasets from a variety of tasks, including D4RL (Fu et al. 2020), MPE (Lowe et al. 2017), and SMAC (Samvelyan et al. 2019).
Dataset Splits No The paper discusses the use of offline and online datasets and how they are merged and used for finetuning, e.g., '|Dk i | = 1%|B0 i |'. However, it does not provide specific details on train/validation/test dataset splits in the traditional sense, or how the D4RL, MPE, and SMAC datasets were partitioned for these purposes within the experiments.
Hardware Specification Yes The experiments are carried out on Intel i7-8700 CPU and NVIDIA GTX 1080Ti GPU.
Software Dependencies No The paper does not specify the version numbers for any software components, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The hyperparameters are summarized in Table 3. Hyperparameter BCQ AWAC BREMEN CQL TD3+BC discount (γ) 0.99 |B| 512 |D| 2000 batch size 128 hidden sizes (64, 64) (256, 256) activation Re LU actor lr ( 10 4) 1 1 1 1 3 critic lr ( 10 4) 1 5 5 1 3 embedding dimension 10 finetuning updates (L) 4000 2000