Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Safe Policy Optimization with Local Generalized Linear Function Approximations
Authors: Akifumi Wachi, Yunyue Wei, Yanan Sui
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of our SPO-LF in two experiments. |
| Researcher Affiliation | Collaboration | Akifumi Wachi IBM Research EMAIL Yunyue Wei Tsinghua University EMAIL Yanan Sui Tsinghua University EMAIL |
| Pseudocode | Yes | Algorithm 1 SPO-LF with ETSE |
| Open Source Code | Yes | For future research, our code is open-sourced.3 https://github.com/akifumi-wachi-4/spolf |
| Open Datasets | Yes | We constructed a simulation environment based on Gym-Mini Grid [12]. |
| Dataset Splits | No | The paper describes providing initial samples and discretizing the environment, but it does not specify explicit train, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper discusses computational cost and efficiency but does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions environments like Gym-Mini Grid and Safety-Gym, but it does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions) that are crucial for replication. |
| Experiment Setup | Yes | Settings. We considered a 25 25 grid in which each grid was associated with a randomly generated feature vector with the dimension d = 5... Finally, we set γ = 0.999, δr = δg = 0.05, and h = 0.1, and optimized a policy with policy iteration. |