DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning

Authors: Hongchang Zhang, Jianzhun Shao, Shuncheng He, Yuhang Jiang, Xiangyang Ji

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our proposed DARL method is competitive to the state-of-the-art methods in offline evaluation tasks.
Researcher Affiliation Academia Tsinghua University hc-zhang19@mails.tsinghua.edu.cn
Pseudocode Yes Algorithm 1: DARL Input: offline dataset D, update iterations tmax, k for k-nearest search, the size of the dataset m Parameter: randomized neural network h, policy network π, an ensemble of critics Z1, ..., ZNC Output: learned policy network π 1: Initialize the policy network, distributional critics and the randomized neural network 2: Project the whole dataset to h to get the features {h(si, ai)}m i=1 3: Build a KD-tree for all the features 4: for t = 1, 2, . . . , tmax do 5: Sample a mini-batch of samples (s, a, r, s ) from D 6: Calculate the uncertainty for the target U(s , π(s )) 7: Formulate the truncated target 8: Update the distributional critics according to Eq.4 9: Update the policy network according to Eq.1 10: end for
Open Source Code No The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets Yes D4RL benchmark (Fu et al. 2020) and Mujoco control suites in D4RL’s -v2 benchmarks (Fu et al. 2020)
Dataset Splits No The paper describes using datasets from D4RL but does not explicitly specify the training/validation/test splits or their proportions for the experiments conducted.
Hardware Specification No The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes For the distance-aware uncertainty estimator, we use a two-layer randomized neural network to project a state-action pair into the feature space. The output dimension is set to be less than 10 to accelerate the k-nearest search. The quantile drop function f(U(s , a )) is a truncated linear function: f(U(s , a )) = clip(η U(s , a )), Cmin, Cmax), where η is a hyperparameter. Cmin and Cmax are the minimum and maximum number of clipped quantiles, respectively. In Fig. 4 , we test different values for hyperparameter η in {1, 10, 100, 1000} on halfcheetah-medium and hopper-medium-replay .