Probabilistic Constrained Reinforcement Learning with Formal Interpretability
Authors: Yanran Wang, Qiuchen Qian, David Boyle
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct tasks with multiple constraints in Open AI Gym (Brockman et al., 2016) and GUARD (Zhao et al., 2023) (a constrained RL benchmark): Acrobot, Cartpole, Walker and Drone. We use four appropriate constrained RL benchmarks: Pa ETS (Okada & Taniguchi, 2020), i.e., a Bayesian RL combining with variational inference, TRPO-IPO (Liu et al., 2020), i.e., an enhanced variant of TRPO-Lagrangian (Bohez et al., 2019), PCPO (Yang et al., 2020), i.e., an advanced variant of CPO (Achiam et al., 2017) and CRPO (Xu et al., 2021), i.e., a primal constrained RL approach. |
| Researcher Affiliation | Academia | 1Systems and Algorithms Laboratory, Imperial College London, South Kensington, London. |
| Pseudocode | Yes | Algorithm 1 ORPO-DR: Optimality-Rectified Policy Optimization using Distributional Representation; Algorithm 2 AWa VO: Adaptive Sliced Wasserstein Variational Optimization |
| Open Source Code | Yes | The practical hardware implementation and additional demonstrations are showcased in a video 1. 1https://github.com/Alex-yanranwang/AWa VO |
| Open Datasets | Yes | We conduct tasks with multiple constraints in Open AI Gym (Brockman et al., 2016) and GUARD (Zhao et al., 2023) (a constrained RL benchmark) |
| Dataset Splits | No | The paper mentions using Open AI Gym and GUARD benchmarks, and discusses training performance and convergence, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | Table 3. Technical Specification of Hardware: QAV250, Intel Real Sense D435i, Holybro ST VL53L1X, Pixhawk 4, T-Motor F60 Pro IV 1750KV, BLHeli-32bit 45A 3-6s, DJI Manifold 2-c (CPU Model: Intel Core i7-8550U) |
| Software Dependencies | No | The paper mentions software environments and benchmarks such as "Open AI Gym", "GUARD", "TRPO-IPO", "PCPO", "CRPO", and "Pa ETS", but it does not provide specific version numbers for any of these software dependencies or other libraries used. |
| Experiment Setup | Yes | The AWa VO parameter settings given in Table 2 of Appendix E.1 are based on selected benchmarks, i.e., CRPO (Xu et al., 2021) and GUARD (Zhao et al., 2023). |