reproducibilityindex.ai

Doubly Mild Generalization for Offline Reinforcement Learning

Authors: Yixiu Mao, Qi Wang, Yun Qu, Yuhang Jiang, Xiangyang Ji

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, DMG achieves state-of-the-art performance across Gym-Mu Jo Co locomotion tasks and challenging Ant Maze tasks.
Researcher Affiliation	Academia	Yixiu Mao1, Qi Wang1, Yun Qu1, Yuhang Jiang1, Xiangyang Ji1 1Department of Automation, Tsinghua University myx21@mails.tsinghua.edu.cn, xyji@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1 DMG 1: Initialize πϕ, πϕ , Qθ, Qθ , and Vψ. 2: for each gradient step do 3: Update ψ by minimizing Eq. (15) 4: Update θ by minimizing Eq. (16) 5: Update ϕ by maximizing Eq. (14) 6: Update target networks: θ (1 ξ)θ + ξθ, ϕ (1 ξ)ϕ + ξϕ 7: end for
Open Source Code	Yes	Our code is available at https://github.com/maoyixiu/DMG.
Open Datasets	Yes	We evaluate the proposed approach on Gym-Mu Jo Co locomotion domains and challenging Ant Maze domains in D4RL [16]. [16] Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219, 2020.
Dataset Splits	No	The paper describes how evaluation is performed (e.g., averaging returns over evaluation trajectories and random seeds) but does not explicitly provide percentages or counts for training, validation, and test dataset splits.
Hardware Specification	Yes	We test the runtime of DMG and other baselines on a Ge Force RTX 3090.
Software Dependencies	No	The paper mentions several algorithms and optimizers (e.g., TD3, IQL, XQL, SQL, Adam) and their specific parameters. However, it does not provide specific version numbers for underlying software dependencies like Python, PyTorch/TensorFlow, CUDA, or other libraries.
Experiment Setup	Yes	Table 5: Hyperparameters of DMG. Includes Optimizer, Critic learning rate, Actor learning rate, Batch size, Discount factor, Number of iterations, Target update rate, Number of Critics, Penalty coefficient, Expectile, Inverse temperature, and Architecture details.