reproducibilityindex.ai

Bi-Level Offline Policy Optimization with Limited Exploration

Authors: Wenzhuo Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model using a blend of synthetic, benchmark, and real-world datasets for offline RL, showing that it performs competitively with state-of-the-art methods.
Researcher Affiliation	Academia	Wenzhuo Zhou Department of Statistics University of California Irvine wenzhuz3@uci.edu
Pseudocode	Yes	Algorithm 1 Adversarial proximal-mapping algorithm
Open Source Code	No	The paper does not contain any explicit statement about making the source code for its methodology publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate our proposed approach on the D4RL benchmark of Open AI Gym locomotion (walker2d, hopper, halfcheetah) and Maze2D tasks [15]... The Ohio Type 1 Diabetes (Ohio T1DM) dataset [33]...
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test dataset splits, specific split percentages, or cross-validation setups.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup	Yes	We use γ = 0.95 with the sample-size n = 1500 in all experiments. Tuning parameter selection is an open problem in offline policy optimization. Fortunately, Theorem 5.2 suggests an offline selection rule for hyperparameters λ and c . In the following experiments, we set the hyper-parameters satisfying the condition O( n1/4 d log( V n)). We vary different values of α for evaluating the algorithm performance in low , medium and relatively high offline data exploration scenarios.