Probabilistic Constrained Reinforcement Learning with Formal Interpretability

Authors: Yanran Wang, Qiuchen Qian, David Boyle

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct tasks with multiple constraints in Open AI Gym (Brockman et al., 2016) and GUARD (Zhao et al., 2023) (a constrained RL benchmark): Acrobot, Cartpole, Walker and Drone. We use four appropriate constrained RL benchmarks: Pa ETS (Okada & Taniguchi, 2020), i.e., a Bayesian RL combining with variational inference, TRPO-IPO (Liu et al., 2020), i.e., an enhanced variant of TRPO-Lagrangian (Bohez et al., 2019), PCPO (Yang et al., 2020), i.e., an advanced variant of CPO (Achiam et al., 2017) and CRPO (Xu et al., 2021), i.e., a primal constrained RL approach.
Researcher Affiliation Academia 1Systems and Algorithms Laboratory, Imperial College London, South Kensington, London.
Pseudocode Yes Algorithm 1 ORPO-DR: Optimality-Rectified Policy Optimization using Distributional Representation; Algorithm 2 AWa VO: Adaptive Sliced Wasserstein Variational Optimization
Open Source Code Yes The practical hardware implementation and additional demonstrations are showcased in a video 1. 1https://github.com/Alex-yanranwang/AWa VO
Open Datasets Yes We conduct tasks with multiple constraints in Open AI Gym (Brockman et al., 2016) and GUARD (Zhao et al., 2023) (a constrained RL benchmark)
Dataset Splits No The paper mentions using Open AI Gym and GUARD benchmarks, and discusses training performance and convergence, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes Table 3. Technical Specification of Hardware: QAV250, Intel Real Sense D435i, Holybro ST VL53L1X, Pixhawk 4, T-Motor F60 Pro IV 1750KV, BLHeli-32bit 45A 3-6s, DJI Manifold 2-c (CPU Model: Intel Core i7-8550U)
Software Dependencies No The paper mentions software environments and benchmarks such as "Open AI Gym", "GUARD", "TRPO-IPO", "PCPO", "CRPO", and "Pa ETS", but it does not provide specific version numbers for any of these software dependencies or other libraries used.
Experiment Setup Yes The AWa VO parameter settings given in Table 2 of Appendix E.1 are based on selected benchmarks, i.e., CRPO (Xu et al., 2021) and GUARD (Zhao et al., 2023).