Safe Policy Optimization with Local Generalized Linear Function Approximations
Authors: Akifumi Wachi, Yunyue Wei, Yanan Sui
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of our SPO-LF in two experiments. |
| Researcher Affiliation | Collaboration | Akifumi Wachi IBM Research akifumi.wachi@ibm.com Yunyue Wei Tsinghua University weiyy20@mails.tsinghua.edu.cn Yanan Sui Tsinghua University ysui@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 SPO-LF with ETSE |
| Open Source Code | Yes | For future research, our code is open-sourced.3 https://github.com/akifumi-wachi-4/spolf |
| Open Datasets | Yes | We constructed a simulation environment based on Gym-Mini Grid [12]. |
| Dataset Splits | No | The paper describes providing initial samples and discretizing the environment, but it does not specify explicit train, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper discusses computational cost and efficiency but does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions environments like Gym-Mini Grid and Safety-Gym, but it does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions) that are crucial for replication. |
| Experiment Setup | Yes | Settings. We considered a 25 25 grid in which each grid was associated with a randomly generated feature vector with the dimension d = 5... Finally, we set γ = 0.999, δr = δg = 0.05, and h = 0.1, and optimized a policy with policy iteration. |