State Deviation Correction for Offline Reinforcement Learning
Authors: Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, Guanwen Zhang, Xiangyang Ji9022-9030
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our proposed method is competitive with the state-of-the-art methods in a Grid World setup, offline Mujoco control suite, and a modified offline Mujoco dataset with a finite number of valuable samples. |
| Researcher Affiliation | Academia | Hongchang Zhang1, Jianzhun Shao1, Yuhang Jiang1, Shuncheng He1, Guanwen Zhang2, Xiangyang Ji1* 1 Tsinghua University, 2 Northwestern Polytechnical University hc-zhang19@mails.tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: State Deviation Correction |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | For the dataset, we use a Grid World setting and the Mujoco datasets in the D4RL benchmarks (Fu et al. 2020).Fu, J.; Kumar, A.; Nachum, O.; Tucker, G.; and Levine, S. 2020. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219. |
| Dataset Splits | No | The paper uses D4RL datasets but does not explicitly provide specific details about the training, validation, and test splits (e.g., percentages, sample counts, or explicit references to predefined splits used for reproduction). |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU or CPU models, memory, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions various algorithms and models (e.g., soft actor-critic, CVAE) and tools (t-SNE), but it does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, scikit-learn versions) required to replicate the experiment. |
| Experiment Setup | Yes | We choose η = 0.05 in our experiment. In our implementation, we use Gaussian kernels and set n = m = 4. SDC first adds a noise ϵ with small magnitude to the state and formulates a noisy state as: ˆs = s + β ϵ, where ϵ is sampled from a Gaussian distribution N(0, 1) and β is a small constant. For each task, we train a CQL agent, a BEAR agent, and an SDC agent for 1,000,000 updates. |