Risk-sensitive control as inference with Rényi divergence
Authors: Kaito Ito, Kenji Kashima
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The behavior of the risk-sensitive soft actor-critic is examined via an experiment. |
| Researcher Affiliation | Academia | Kaito Ito The University of Tokyo kaito@g.ecc.u-tokyo.ac.jp Kenji Kashima Kyoto University kk@i.kyoto-u.ac.jp |
| Pseudocode | No | The paper describes algorithms but does not provide them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | The code is available at https://github.com/kaito-1111/risk-sensitive-sac.git. |
| Open Datasets | Yes | The environment is Pendulum-v1 in Open AI Gymnasium. |
| Dataset Splits | No | The paper mentions training and testing but does not provide specific percentages or absolute counts for dataset splits (train/validation/test). |
| Hardware Specification | Yes | For the training, we used an Ubuntu 20.04 server (GPU: NVIDIA Ge Force RTX 2080Ti). |
| Software Dependencies | No | The implementation of the risk-sensitive SAC (RSAC) algorithm follows the stable-baselines3 [50] version of the SAC algorithm... optimizer Adam [51]. No specific version numbers for these or other software are provided. |
| Experiment Setup | Yes | Now, we introduce a series of hyperparameters listed in Table 1 shared for both SAC and RSAC algorithms. Table 1: SAC and RSAC Hyperparameters Parameter Value optimizer Adam [51] learning rate 10 3 discount factor 0.99 regularization coefficient 0.1 target smoothing coefficient 0.005 replay buffer size 105 number of critic networks 2 number of hidden layers (all networks) 2 number of hidden units per layer 256 activation function Re LU |