Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings

Authors: Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee.
Researcher Affiliation Academia Hongpeng Cao1 , Yanbing Mao2 , Lui Sha3 & Marco Caccamo1 1TUM, Germany, 2WSU, United States, 3UIUC, United States
Pseudocode Yes Algorithm 1 NN Input Augmentation Algorithm 2 Physics-Model-Guided Neural Network Editing
Open Source Code Yes The code to reproduce our experimental results and supplementary materials are available at https: //github.com/HP-CAO/phy_rl.
Open Datasets Yes We take the cart-pole simulator provided in Open-AI Gym Brockman et al. (2016).
Dataset Splits No No explicit validation set or split information is provided. The paper mentions training steps and resetting episodes, but not a separate validation split for hyperparameter tuning or model selection.
Hardware Specification No No specific hardware details (like CPU/GPU models or memory) are provided. The paper only mentions the operating system (Ubuntu) and software frameworks used.
Software Dependencies Yes For the code, we use the Python API for the Tensor Flow framework Kingma & Ba and the Adam optimizer Abadi et al. for training. This project is using the settings: 1) Ubuntu 20.04, 2) Python 3.7, 3) Tensor Flow 2.5.0, 4) Numpy 1.19.5, and 5) Gym 0.20. ... This project is using the settings: 1) Ubuntu 22.04, 2) Python 3.7, 3) Tensor Flow 2.5.0, 4) Numpy 1.19.5, and 5) Pybullet.
Experiment Setup Yes The actor and critic networks in the DDPG algorithm are implemented as a Multi-Layer Perceptron (MLP) with four fully connected layers. The output dimensions of critic and actor networks are 256, 128, 64, and 1, respectively. The activation functions of the first three neural layers are Re LU, while the output of the last layer is the Tanh function for the actor-network and Linear for the critic network. The input of the critic network is [s; a], while the input of the actor-network is s. ... For Phy-DRL, we let discount factor γ = 0.4, and the learning rates of critic and actor networks are the same as 0.0003. We set the batch size to 200. The total training steps are 106, and the maximum step number of one episode is 1000.