Feasible Reachable Policy Iteration
Authors: Shentao Qin, Yujie Yang, Yao Mu, Jie Li, Wenjun Zou, Jingliang Duan, Shengbo Eben Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results verify the effectiveness of the proposed FR function in both improving the convergence speed of better or comparable performance without sacrificing safety and identifying a smaller policy space with higher sample efficiency. We test our algorithm on the frozen lake (gym) environment, two classical control tasks, and the safety gym benchmark. |
| Researcher Affiliation | Academia | 1School of Vehicle and Mobility, Tsinghua University, Beijing, China 2Department of Computer Science, The University of Hong Kong, Hong Kong, China 3School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Feasible Reachable Region Identification and Algorithm 2 Feasible Reachable Policy Iteration (FRPI) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a direct link to a code repository. |
| Open Datasets | Yes | We compare the algorithms on four high-dimensional robot navigation tasks in Safety Gym (Ray et al., 2019) |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits by percentage, sample counts, or a specific splitting methodology. |
| Hardware Specification | Yes | We conducted training on an NVIDIA GPU 3090 using JAX, setting XLA PYTHON CLIENT MEM FRACTION to 0.1, which allocates 2720 MB of GPU memory. |
| Software Dependencies | No | The paper mentions 'JAX' but does not specify a version number. Other software or library dependencies are not listed with specific version numbers. |
| Experiment Setup | Yes | The hyperparameters used in the experiments are listed in Tab. 4. |