reproducibilityindex.ai

State Deviation Correction for Offline Reinforcement Learning

Authors: Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, Guanwen Zhang, Xiangyang Ji9022-9030

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our proposed method is competitive with the state-of-the-art methods in a Grid World setup, offline Mujoco control suite, and a modified offline Mujoco dataset with a finite number of valuable samples.
Researcher Affiliation	Academia	Hongchang Zhang1, Jianzhun Shao1, Yuhang Jiang1, Shuncheng He1, Guanwen Zhang2, Xiangyang Ji1* 1 Tsinghua University, 2 Northwestern Polytechnical University hc-zhang19@mails.tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1: State Deviation Correction
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	For the dataset, we use a Grid World setting and the Mujoco datasets in the D4RL benchmarks (Fu et al. 2020).Fu, J.; Kumar, A.; Nachum, O.; Tucker, G.; and Levine, S. 2020. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219.
Dataset Splits	No	The paper uses D4RL datasets but does not explicitly provide specific details about the training, validation, and test splits (e.g., percentages, sample counts, or explicit references to predefined splits used for reproduction).
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU or CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions various algorithms and models (e.g., soft actor-critic, CVAE) and tools (t-SNE), but it does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, scikit-learn versions) required to replicate the experiment.
Experiment Setup	Yes	We choose η = 0.05 in our experiment. In our implementation, we use Gaussian kernels and set n = m = 4. SDC first adds a noise ϵ with small magnitude to the state and formulates a noisy state as: ˆs = s + β ϵ, where ϵ is sampled from a Gaussian distribution N(0, 1) and β is a small constant. For each task, we train a CQL agent, a BEAR agent, and an SDC agent for 1,000,000 updates.