Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation
Authors: Yannick Hogewind, Thiago D. Simão, Tal Kachman, Nils Jansen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Safe SLAC using a set of benchmark environments introduced by Ray et al. (2019). The empirical evaluation shows competitive results compared with complex state-of-the-art approaches. |
| Researcher Affiliation | Academia | Yannick Hogewind, Thiago D. Sim ao, Tal Kachman & Nils Jansen Radboud University, Nijmegen {yannick.hogewind,thiago.simao,nils.jansen}@ru.nl tal.kachman@donders.ru.nl |
| Pseudocode | Yes | Algorithm 1 Safe SLAC |
| Open Source Code | Yes | The code for our implementation of Safe SLAC is available on Git Hub at https://github. com/lava-lab/safe-slac. The repository contains implementations for the Safe SLAC actor, critics, model and training procedure. |
| Open Datasets | Yes | We evaluate our approach on a set of six Safety Gym benchmark environments introduced by Ray et al. (2019) as SG6, shown in Figure 1. |
| Dataset Splits | No | The paper describes using evaluation episodes during training but does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts). |
| Hardware Specification | No | The paper mentions 'measured on the same hardware' when comparing computational times but does not specify the type of hardware used (e.g., specific GPU or CPU models). |
| Software Dependencies | No | The paper mentions software like 'PyTorch' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 1: Hyperparameters used for safe SLAC Parameter Value Action repeat 2 Image size 64 64 3 Image reconstruction noise 0.4 Length of sequences sampled from replay buffer 10 Discount factor 0.99 Cost discount factor 0.995 z1 size 32 z2 size 200 Replay buffer size 2 10^5 Latent model update batch size 32 Actor-critic update batch size 64 Latent model learning rate 1 10^4 Actor-critic learning rate 2 10^4 Safety Lagrange multiplier learning rate 2e^4 Initial value for α 4 10^3 Initial value for λ 2 10^2 Warmup environment steps 60 10^3 Warmup latent model training steps 30 10^3 Gradient clipping max norm 40 Target network update exponential factor 5 10^3 |