Risk-sensitive control as inference with Rényi divergence

Authors: Kaito Ito, Kenji Kashima

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The behavior of the risk-sensitive soft actor-critic is examined via an experiment.
Researcher Affiliation Academia Kaito Ito The University of Tokyo kaito@g.ecc.u-tokyo.ac.jp Kenji Kashima Kyoto University kk@i.kyoto-u.ac.jp
Pseudocode No The paper describes algorithms but does not provide them in a structured pseudocode or algorithm block.
Open Source Code Yes The code is available at https://github.com/kaito-1111/risk-sensitive-sac.git.
Open Datasets Yes The environment is Pendulum-v1 in Open AI Gymnasium.
Dataset Splits No The paper mentions training and testing but does not provide specific percentages or absolute counts for dataset splits (train/validation/test).
Hardware Specification Yes For the training, we used an Ubuntu 20.04 server (GPU: NVIDIA Ge Force RTX 2080Ti).
Software Dependencies No The implementation of the risk-sensitive SAC (RSAC) algorithm follows the stable-baselines3 [50] version of the SAC algorithm... optimizer Adam [51]. No specific version numbers for these or other software are provided.
Experiment Setup Yes Now, we introduce a series of hyperparameters listed in Table 1 shared for both SAC and RSAC algorithms. Table 1: SAC and RSAC Hyperparameters Parameter Value optimizer Adam [51] learning rate 10 3 discount factor 0.99 regularization coefficient 0.1 target smoothing coefficient 0.005 replay buffer size 105 number of critic networks 2 number of hidden layers (all networks) 2 number of hidden units per layer 256 activation function Re LU