Bi-Level Offline Policy Optimization with Limited Exploration
Authors: Wenzhuo Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model using a blend of synthetic, benchmark, and real-world datasets for offline RL, showing that it performs competitively with state-of-the-art methods. |
| Researcher Affiliation | Academia | Wenzhuo Zhou Department of Statistics University of California Irvine wenzhuz3@uci.edu |
| Pseudocode | Yes | Algorithm 1 Adversarial proximal-mapping algorithm |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code for its methodology publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate our proposed approach on the D4RL benchmark of Open AI Gym locomotion (walker2d, hopper, halfcheetah) and Maze2D tasks [15]... The Ohio Type 1 Diabetes (Ohio T1DM) dataset [33]... |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits, specific split percentages, or cross-validation setups. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used in the experiments. |
| Experiment Setup | Yes | We use γ = 0.95 with the sample-size n = 1500 in all experiments. Tuning parameter selection is an open problem in offline policy optimization. Fortunately, Theorem 5.2 suggests an offline selection rule for hyperparameters λ and c . In the following experiments, we set the hyper-parameters satisfying the condition O( n1/4 d log( V n)). We vary different values of α for evaluating the algorithm performance in low , medium and relatively high offline data exploration scenarios. |