Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees
Authors: Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas Henzinger
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment. |
| Researcher Affiliation | Academia | Ðor de Žikeli c Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria djordje.zikelic@ist.ac.at Mathias Lechner Massachusetts Institute of Technology Cambridge, MA, USA mlechner@mit.edu Abhinav Verma The Pennsylvania State University University Park, PA, USA verma@psu.edu Krishnendu Chatterjee Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria krishnendu.chatterjee@ist.ac.at Thomas A. Henzinger Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria tah@ist.ac.at |
| Pseudocode | Yes | The algorithm pseudocode is presented in Algorithm 1. |
| Open Source Code | Yes | Our code is available at https://github.com/mlech26l/neural_martingales |
| Open Datasets | No | The paper mentions the 'Stochastic Nine Rooms environment', which is obtained by injecting stochastic disturbances to the environment of [33]. However, it does not provide a link or specific access information for this customized environment/dataset to be considered publicly available. |
| Dataset Splits | No | The paper describes an RL environment but does not specify explicit train/validation/test splits for data or evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'proximal policy optimization (PPO) [50]' but does not provide specific version numbers for any software dependencies like PPO itself, Python, or machine learning frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | No | The paper states that PPO was used to initialize policy parameters but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations. |