Bi-Level Offline Policy Optimization with Limited Exploration

Authors: Wenzhuo Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model using a blend of synthetic, benchmark, and real-world datasets for offline RL, showing that it performs competitively with state-of-the-art methods.
Researcher Affiliation Academia Wenzhuo Zhou Department of Statistics University of California Irvine wenzhuz3@uci.edu
Pseudocode Yes Algorithm 1 Adversarial proximal-mapping algorithm
Open Source Code No The paper does not contain any explicit statement about making the source code for its methodology publicly available, nor does it provide a link to a code repository.
Open Datasets Yes We evaluate our proposed approach on the D4RL benchmark of Open AI Gym locomotion (walker2d, hopper, halfcheetah) and Maze2D tasks [15]... The Ohio Type 1 Diabetes (Ohio T1DM) dataset [33]...
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits, specific split percentages, or cross-validation setups.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup Yes We use γ = 0.95 with the sample-size n = 1500 in all experiments. Tuning parameter selection is an open problem in offline policy optimization. Fortunately, Theorem 5.2 suggests an offline selection rule for hyperparameters λ and c . In the following experiments, we set the hyper-parameters satisfying the condition O( n1/4 d log( V n)). We vary different values of α for evaluating the algorithm performance in low , medium and relatively high offline data exploration scenarios.