Information-Theoretic State Space Model for Multi-View Reinforcement Learning
Authors: Hyeongjoo Hwang, Seokin Seo, Youngsoo Jang, Sungyoon Kim, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an extensive set of experiments in various control tasks showing that our method is highly effective in aggregating task-relevant information across many views, that scales linearly with the number of views while retaining robustness to arbitrary missing view scenarios. |
| Researcher Affiliation | Collaboration | 1Kim Jaechul Graduate School of AI, KAIST, Daejeon, South Korea 2LG AI Research, Seoul, South Korea 3School of Computing, KAIST, Daejeon, South Korea. |
| Pseudocode | Yes | Algorithm 1 summarizes the overall optimization process. |
| Open Source Code | Yes | The code is available at: https://github.com/gr8joo/F2C |
| Open Datasets | Yes | Bipedal Walker: https://zenodo.org/record/6583263, https://zenodo.org/record/6583291 SUMO: https://zenodo.org/record/7568625#.Y9Etzn ZBy Hs [...] we employed 8 complex robotic arm manipulation tasks in Metaworld (Yu et al., 2020) |
| Dataset Splits | Yes | The dataset collected in Step 1 was split into train (0.8) and validation (0.2) sets |
| Hardware Specification | Yes | For Bipedal Walker, we used 40 CPU instances (n1-highcpu-32) from Google Cloud Platform (GCP). For SUMO and Metaworld, we used 10 systems equipped with following devices. CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz Memory: 32 GB GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2) |
| Software Dependencies | Yes | GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2). [...] PPO implementation in Bipedal Walker from Barhate (2021) [Pytorch implicitly mentioned]. |
| Experiment Setup | Yes | Hyperparameter Bipedal Walker Metaworld Number of Views (N) 5 3 Policy PPO PPO PPO batch size 4,000 6,400 Rollout buffer size 4,000 100,000 # Epochs per update 80 8 Gamma 0.99 0.99 GAE lambda 0.95 Clip range ( ϵ ) 0.2 0.2 Entropy coefficient 0.005 0.0 Value function coefficient 0.5 Gradient clip 0.5 Target KL 0.12 Learning rate (actor) 3e-4 3e-4 Learning rate (critic) 1e-3 3e-4 Learning rate (representation) 3e-4 3e-4 Observation buffer size 100,000 # Unsupervised learning steps 400 Subsequence length (H) 8 4 Size of the latent state (representation) 24 128 |