Weighted Policy Constraints for Offline Reinforcement Learning
Authors: Zhiyong Peng, Changlin Han, Yadong Liu, Zongtan Zhou
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our algorithm outperforms existing state-of-the-art offline RL algorithms on the D4RL offline gym datasets. |
| Researcher Affiliation | Academia | College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China |
| Pseudocode | Yes | Algorithm 1: Weighted Policy Constraints |
| Open Source Code | Yes | The source code is available at https://github.com/qsa-fox/wPC. |
| Open Datasets | Yes | D4RL (Fu et al. 2020) is one of the main evaluation environments for offline RL, which consists of a wide of tasks and diverse datasets. |
| Dataset Splits | Yes | We run 1 million steps for training, evaluate policy every 5 thousand steps, and report the average normalized returns of 10 evaluation episodes as the score. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow versions) used in the experiments. |
| Experiment Setup | Yes | The only hyper-parameter attached to standard online RL is the α, which regulates the constraint strength. We set α to 0.1 for medium-expert datasets and 2.5 for others. ... Other hyper-parameters for TD3 components are presented in Table 3, and the neural network architectures are presented in Table 4. |