DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning
Authors: Hongchang Zhang, Jianzhun Shao, Shuncheng He, Yuhang Jiang, Xiangyang Ji
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our proposed DARL method is competitive to the state-of-the-art methods in offline evaluation tasks. |
| Researcher Affiliation | Academia | Tsinghua University hc-zhang19@mails.tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: DARL Input: offline dataset D, update iterations tmax, k for k-nearest search, the size of the dataset m Parameter: randomized neural network h, policy network π, an ensemble of critics Z1, ..., ZNC Output: learned policy network π 1: Initialize the policy network, distributional critics and the randomized neural network 2: Project the whole dataset to h to get the features {h(si, ai)}m i=1 3: Build a KD-tree for all the features 4: for t = 1, 2, . . . , tmax do 5: Sample a mini-batch of samples (s, a, r, s ) from D 6: Calculate the uncertainty for the target U(s , π(s )) 7: Formulate the truncated target 8: Update the distributional critics according to Eq.4 9: Update the policy network according to Eq.1 10: end for |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | D4RL benchmark (Fu et al. 2020) and Mujoco control suites in D4RL’s -v2 benchmarks (Fu et al. 2020) |
| Dataset Splits | No | The paper describes using datasets from D4RL but does not explicitly specify the training/validation/test splits or their proportions for the experiments conducted. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For the distance-aware uncertainty estimator, we use a two-layer randomized neural network to project a state-action pair into the feature space. The output dimension is set to be less than 10 to accelerate the k-nearest search. The quantile drop function f(U(s , a )) is a truncated linear function: f(U(s , a )) = clip(η U(s , a )), Cmin, Cmax), where η is a hyperparameter. Cmin and Cmax are the minimum and maximum number of clipped quantiles, respectively. In Fig. 4 , we test different values for hyperparameter η in {1, 10, 100, 1000} on halfcheetah-medium and hopper-medium-replay . |