Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings
Authors: Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee. |
| Researcher Affiliation | Academia | Hongpeng Cao1 , Yanbing Mao2 , Lui Sha3 & Marco Caccamo1 1TUM, Germany, 2WSU, United States, 3UIUC, United States |
| Pseudocode | Yes | Algorithm 1 NN Input Augmentation Algorithm 2 Physics-Model-Guided Neural Network Editing |
| Open Source Code | Yes | The code to reproduce our experimental results and supplementary materials are available at https: //github.com/HP-CAO/phy_rl. |
| Open Datasets | Yes | We take the cart-pole simulator provided in Open-AI Gym Brockman et al. (2016). |
| Dataset Splits | No | No explicit validation set or split information is provided. The paper mentions training steps and resetting episodes, but not a separate validation split for hyperparameter tuning or model selection. |
| Hardware Specification | No | No specific hardware details (like CPU/GPU models or memory) are provided. The paper only mentions the operating system (Ubuntu) and software frameworks used. |
| Software Dependencies | Yes | For the code, we use the Python API for the Tensor Flow framework Kingma & Ba and the Adam optimizer Abadi et al. for training. This project is using the settings: 1) Ubuntu 20.04, 2) Python 3.7, 3) Tensor Flow 2.5.0, 4) Numpy 1.19.5, and 5) Gym 0.20. ... This project is using the settings: 1) Ubuntu 22.04, 2) Python 3.7, 3) Tensor Flow 2.5.0, 4) Numpy 1.19.5, and 5) Pybullet. |
| Experiment Setup | Yes | The actor and critic networks in the DDPG algorithm are implemented as a Multi-Layer Perceptron (MLP) with four fully connected layers. The output dimensions of critic and actor networks are 256, 128, 64, and 1, respectively. The activation functions of the first three neural layers are Re LU, while the output of the last layer is the Tanh function for the actor-network and Linear for the critic network. The input of the critic network is [s; a], while the input of the actor-network is s. ... For Phy-DRL, we let discount factor γ = 0.4, and the learning rates of critic and actor networks are the same as 0.0003. We set the batch size to 200. The total training steps are 106, and the maximum step number of one episode is 1000. |