reproducibilityindex.ai

Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting

Authors: Da Wang, Lin Li, Wei Wei, Qixian Yu, Jianye Hao, Jiye Liang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the effectiveness with extensive experiments. Code is available at https: //github.com/Dk ING-lv6/ADS. Theoretical analysis and extensive experiments demonstrate the effectiveness of ADS in improving generalization. We apply ADS to related widely-used baselines, with the best performance surpassing some state-of-the-art ofﬂine algorithms. In this section, we conduct experiments to validate the effectiveness of ADS by answering the following questions: (i) How does ADS perform on the benchmarks by applying it to existing widely-used algorithms?
Researcher Affiliation	Academia	1Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, China. 2College of Intelligence and Computing, Tianjin University, Tianjin, China. Correspondence to: Wei Wei <weiwei@sxu.edu.cn>.
Pseudocode	Yes	Appendix A provides the detailed algorithm. Our algorithm consists of the following two components. Algorithm 1 is the process of using Equation (8) to update the model. Algorithm 2 solves the problem 9 to ﬁnd the hardest train/validation subsets splitting.
Open Source Code	Yes	Code is available at https: //github.com/Dk ING-lv6/ADS.
Open Datasets	Yes	First, we apply our ADS framework to existing widely-used algorithms, CQL (Kumar et al., 2020), TD3+BC (Fujimoto & Gu, 2021), and MCQ (Lyu et al., 2022), and conduct experiments on several D4RL (Fu et al., 2020) gym Mu Jo Cov2 datasets. Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. D4rl: Datasets for deep data-driven reinforcement learning, 2020.
Dataset Splits	Yes	We split the ofﬂine dataset into train/validation (Dt/Dv) subsets that have distribution discrepancies. Figure 2: ADS framework for ofﬂine RL. We split the ofﬂine dataset into the train/validation subsets. Figure 3: Illustration of actor-critic implementation with the ADS framework. Step 1: We cluster state-action pairs with GMM to form strata, then split the ofﬂine dataset into train/validation subsets.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions that baseline algorithms come from a code library but does not provide specific version numbers for any software components (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	No	The paper discusses the general types of hyperparameters that influence ADS (ratio ζ, number of clusters K, step size α) and their impact, and it describes the practical implementation of the method (e.g., Bellman backup, value discrepancy calculation). However, it does not provide concrete values for the hyperparameters (e.g., learning rates, batch sizes, epochs) used for the main experimental results presented in tables.