Information-Theoretic State Space Model for Multi-View Reinforcement Learning

Authors: Hyeongjoo Hwang, Seokin Seo, Youngsoo Jang, Sungyoon Kim, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an extensive set of experiments in various control tasks showing that our method is highly effective in aggregating task-relevant information across many views, that scales linearly with the number of views while retaining robustness to arbitrary missing view scenarios.
Researcher Affiliation Collaboration 1Kim Jaechul Graduate School of AI, KAIST, Daejeon, South Korea 2LG AI Research, Seoul, South Korea 3School of Computing, KAIST, Daejeon, South Korea.
Pseudocode Yes Algorithm 1 summarizes the overall optimization process.
Open Source Code Yes The code is available at: https://github.com/gr8joo/F2C
Open Datasets Yes Bipedal Walker: https://zenodo.org/record/6583263, https://zenodo.org/record/6583291 SUMO: https://zenodo.org/record/7568625#.Y9Etzn ZBy Hs [...] we employed 8 complex robotic arm manipulation tasks in Metaworld (Yu et al., 2020)
Dataset Splits Yes The dataset collected in Step 1 was split into train (0.8) and validation (0.2) sets
Hardware Specification Yes For Bipedal Walker, we used 40 CPU instances (n1-highcpu-32) from Google Cloud Platform (GCP). For SUMO and Metaworld, we used 10 systems equipped with following devices. CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz Memory: 32 GB GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2)
Software Dependencies Yes GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2). [...] PPO implementation in Bipedal Walker from Barhate (2021) [Pytorch implicitly mentioned].
Experiment Setup Yes Hyperparameter Bipedal Walker Metaworld Number of Views (N) 5 3 Policy PPO PPO PPO batch size 4,000 6,400 Rollout buffer size 4,000 100,000 # Epochs per update 80 8 Gamma 0.99 0.99 GAE lambda 0.95 Clip range ( ϵ ) 0.2 0.2 Entropy coefficient 0.005 0.0 Value function coefficient 0.5 Gradient clip 0.5 Target KL 0.12 Learning rate (actor) 3e-4 3e-4 Learning rate (critic) 1e-3 3e-4 Learning rate (representation) 3e-4 3e-4 Observation buffer size 100,000 # Unsupervised learning steps 400 Subsequence length (H) 8 4 Size of the latent state (representation) 24 128