reproducibilityindex.ai

Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Authors: Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas Henzinger

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.
Researcher Affiliation	Academia	Ðor de Žikeli c Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria djordje.zikelic@ist.ac.at Mathias Lechner Massachusetts Institute of Technology Cambridge, MA, USA mlechner@mit.edu Abhinav Verma The Pennsylvania State University University Park, PA, USA verma@psu.edu Krishnendu Chatterjee Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria krishnendu.chatterjee@ist.ac.at Thomas A. Henzinger Institute of Science and Technology Austria (ISTA) Klosterneuburg, Austria tah@ist.ac.at
Pseudocode	Yes	The algorithm pseudocode is presented in Algorithm 1.
Open Source Code	Yes	Our code is available at https://github.com/mlech26l/neural_martingales
Open Datasets	No	The paper mentions the 'Stochastic Nine Rooms environment', which is obtained by injecting stochastic disturbances to the environment of [33]. However, it does not provide a link or specific access information for this customized environment/dataset to be considered publicly available.
Dataset Splits	No	The paper describes an RL environment but does not specify explicit train/validation/test splits for data or evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using 'proximal policy optimization (PPO) [50]' but does not provide specific version numbers for any software dependencies like PPO itself, Python, or machine learning frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	No	The paper states that PPO was used to initialize policy parameters but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations.