Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting
Authors: Da Wang, Lin Li, Wei Wei, Qixian Yu, Jianye Hao, Jiye Liang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the effectiveness with extensive experiments. Code is available at https: //github.com/Dk ING-lv6/ADS. Theoretical analysis and extensive experiments demonstrate the effectiveness of ADS in improving generalization. We apply ADS to related widely-used baselines, with the best performance surpassing some state-of-the-art offline algorithms. In this section, we conduct experiments to validate the effectiveness of ADS by answering the following questions: (i) How does ADS perform on the benchmarks by applying it to existing widely-used algorithms? |
| Researcher Affiliation | Academia | 1Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, China. 2College of Intelligence and Computing, Tianjin University, Tianjin, China. Correspondence to: Wei Wei <weiwei@sxu.edu.cn>. |
| Pseudocode | Yes | Appendix A provides the detailed algorithm. Our algorithm consists of the following two components. Algorithm 1 is the process of using Equation (8) to update the model. Algorithm 2 solves the problem 9 to find the hardest train/validation subsets splitting. |
| Open Source Code | Yes | Code is available at https: //github.com/Dk ING-lv6/ADS. |
| Open Datasets | Yes | First, we apply our ADS framework to existing widely-used algorithms, CQL (Kumar et al., 2020), TD3+BC (Fujimoto & Gu, 2021), and MCQ (Lyu et al., 2022), and conduct experiments on several D4RL (Fu et al., 2020) gym Mu Jo Cov2 datasets. Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. D4rl: Datasets for deep data-driven reinforcement learning, 2020. |
| Dataset Splits | Yes | We split the offline dataset into train/validation (Dt/Dv) subsets that have distribution discrepancies. Figure 2: ADS framework for offline RL. We split the offline dataset into the train/validation subsets. Figure 3: Illustration of actor-critic implementation with the ADS framework. Step 1: We cluster state-action pairs with GMM to form strata, then split the offline dataset into train/validation subsets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions that baseline algorithms come from a code library but does not provide specific version numbers for any software components (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | No | The paper discusses the general types of hyperparameters that influence ADS (ratio ζ, number of clusters K, step size α) and their impact, and it describes the practical implementation of the method (e.g., Bellman backup, value discrepancy calculation). However, it does not provide concrete values for the hyperparameters (e.g., learning rates, batch sizes, epochs) used for the main experimental results presented in tables. |