Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
Authors: Peng Cheng, Xianyuan Zhan, zhihao wu, Wenjia Zhang, Youfang Lin, Shou cheng Song, Han Wang, Li Jiang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability. |
| Researcher Affiliation | Academia | Peng Cheng 1,3, Xianyuan Zhan 2,4, Zhihao Wu 1,3, Wenjia Zhang2, Shoucheng Song1,3, Han Wang1,3, Youfang Lin1,3, Li Jiang2 1 Beijing Jiaotong University, Beijing, China 2 Tsinghua University, Beijing, China 3 Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China 4 Shanghai Artificial Intelligence Laboratory, Shanghai, China |
| Pseudocode | Yes | Algorithm 1 T-Symmetry Regularized Offline RL (TSRL) |
| Open Source Code | Yes | Code is available at: https://github.com/pcheng2/TSRL |
| Open Datasets | Yes | We evaluate TSRL on the D4RL Mu Jo Co-v2 and Adroit-v1 benchmark datasets [5] |
| Dataset Splits | Yes | We compare the performance of TSRL and the baseline methods on both the full D4RL datasets and their reduced-size datasets with only 5k 10k samples, which are constructed by randomly sampling a given fraction of trajectories in the full datasets*. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'Re LU activation', 'Pytorch', 'Functorch', and 'Jax' but does not specify their version numbers. |
| Experiment Setup | Yes | Table 3: Hyperparameter details for TDM and TSRL |