Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Authors: Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas Henzinger

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.
Researcher Affiliation Academia Ðor de Žikeli c Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria djordje.zikelic@ist.ac.at Mathias Lechner Massachusetts Institute of Technology Cambridge, MA, USA mlechner@mit.edu Abhinav Verma The Pennsylvania State University University Park, PA, USA verma@psu.edu Krishnendu Chatterjee Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria krishnendu.chatterjee@ist.ac.at Thomas A. Henzinger Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria tah@ist.ac.at
Pseudocode Yes The algorithm pseudocode is presented in Algorithm 1.
Open Source Code Yes Our code is available at https://github.com/mlech26l/neural_martingales
Open Datasets No The paper mentions the 'Stochastic Nine Rooms environment', which is obtained by injecting stochastic disturbances to the environment of [33]. However, it does not provide a link or specific access information for this customized environment/dataset to be considered publicly available.
Dataset Splits No The paper describes an RL environment but does not specify explicit train/validation/test splits for data or evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models.
Software Dependencies No The paper mentions using 'proximal policy optimization (PPO) [50]' but does not provide specific version numbers for any software dependencies like PPO itself, Python, or machine learning frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup No The paper states that PPO was used to initialize policy parameters but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations.