Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

Authors: Yannick Hogewind, Thiago D. Simão, Tal Kachman, Nils Jansen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Safe SLAC using a set of benchmark environments introduced by Ray et al. (2019). The empirical evaluation shows competitive results compared with complex state-of-the-art approaches.
Researcher Affiliation Academia Yannick Hogewind, Thiago D. Sim ao, Tal Kachman & Nils Jansen Radboud University, Nijmegen {yannick.hogewind,thiago.simao,nils.jansen}@ru.nl tal.kachman@donders.ru.nl
Pseudocode Yes Algorithm 1 Safe SLAC
Open Source Code Yes The code for our implementation of Safe SLAC is available on Git Hub at https://github. com/lava-lab/safe-slac. The repository contains implementations for the Safe SLAC actor, critics, model and training procedure.
Open Datasets Yes We evaluate our approach on a set of six Safety Gym benchmark environments introduced by Ray et al. (2019) as SG6, shown in Figure 1.
Dataset Splits No The paper describes using evaluation episodes during training but does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts).
Hardware Specification No The paper mentions 'measured on the same hardware' when comparing computational times but does not specify the type of hardware used (e.g., specific GPU or CPU models).
Software Dependencies No The paper mentions software like 'PyTorch' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Table 1: Hyperparameters used for safe SLAC Parameter Value Action repeat 2 Image size 64 64 3 Image reconstruction noise 0.4 Length of sequences sampled from replay buffer 10 Discount factor 0.99 Cost discount factor 0.995 z1 size 32 z2 size 200 Replay buffer size 2 10^5 Latent model update batch size 32 Actor-critic update batch size 64 Latent model learning rate 1 10^4 Actor-critic learning rate 2 10^4 Safety Lagrange multiplier learning rate 2e^4 Initial value for α 4 10^3 Initial value for λ 2 10^2 Warmup environment steps 60 10^3 Warmup latent model training steps 30 10^3 Gradient clipping max norm 40 Target network update exponential factor 5 10^3