reproducibilityindex.ai

Probabilistic Constrained Reinforcement Learning with Formal Interpretability

Authors: Yanran Wang, Qiuchen Qian, David Boyle

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct tasks with multiple constraints in Open AI Gym (Brockman et al., 2016) and GUARD (Zhao et al., 2023) (a constrained RL benchmark): Acrobot, Cartpole, Walker and Drone. We use four appropriate constrained RL benchmarks: Pa ETS (Okada & Taniguchi, 2020), i.e., a Bayesian RL combining with variational inference, TRPO-IPO (Liu et al., 2020), i.e., an enhanced variant of TRPO-Lagrangian (Bohez et al., 2019), PCPO (Yang et al., 2020), i.e., an advanced variant of CPO (Achiam et al., 2017) and CRPO (Xu et al., 2021), i.e., a primal constrained RL approach.
Researcher Affiliation	Academia	1Systems and Algorithms Laboratory, Imperial College London, South Kensington, London.
Pseudocode	Yes	Algorithm 1 ORPO-DR: Optimality-Rectified Policy Optimization using Distributional Representation; Algorithm 2 AWa VO: Adaptive Sliced Wasserstein Variational Optimization
Open Source Code	Yes	The practical hardware implementation and additional demonstrations are showcased in a video 1. 1https://github.com/Alex-yanranwang/AWa VO
Open Datasets	Yes	We conduct tasks with multiple constraints in Open AI Gym (Brockman et al., 2016) and GUARD (Zhao et al., 2023) (a constrained RL benchmark)
Dataset Splits	No	The paper mentions using Open AI Gym and GUARD benchmarks, and discusses training performance and convergence, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	Yes	Table 3. Technical Specification of Hardware: QAV250, Intel Real Sense D435i, Holybro ST VL53L1X, Pixhawk 4, T-Motor F60 Pro IV 1750KV, BLHeli-32bit 45A 3-6s, DJI Manifold 2-c (CPU Model: Intel Core i7-8550U)
Software Dependencies	No	The paper mentions software environments and benchmarks such as "Open AI Gym", "GUARD", "TRPO-IPO", "PCPO", "CRPO", and "Pa ETS", but it does not provide specific version numbers for any of these software dependencies or other libraries used.
Experiment Setup	Yes	The AWa VO parameter settings given in Table 2 of Appendix E.1 are based on selected benchmarks, i.e., CRPO (Xu et al., 2021) and GUARD (Zhao et al., 2023).