reproducibilityindex.ai

Information-Theoretic State Space Model for Multi-View Reinforcement Learning

Authors: Hyeongjoo Hwang, Seokin Seo, Youngsoo Jang, Sungyoon Kim, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an extensive set of experiments in various control tasks showing that our method is highly effective in aggregating task-relevant information across many views, that scales linearly with the number of views while retaining robustness to arbitrary missing view scenarios.
Researcher Affiliation	Collaboration	1Kim Jaechul Graduate School of AI, KAIST, Daejeon, South Korea 2LG AI Research, Seoul, South Korea 3School of Computing, KAIST, Daejeon, South Korea.
Pseudocode	Yes	Algorithm 1 summarizes the overall optimization process.
Open Source Code	Yes	The code is available at: https://github.com/gr8joo/F2C
Open Datasets	Yes	Bipedal Walker: https://zenodo.org/record/6583263, https://zenodo.org/record/6583291 SUMO: https://zenodo.org/record/7568625#.Y9Etzn ZBy Hs [...] we employed 8 complex robotic arm manipulation tasks in Metaworld (Yu et al., 2020)
Dataset Splits	Yes	The dataset collected in Step 1 was split into train (0.8) and validation (0.2) sets
Hardware Specification	Yes	For Bipedal Walker, we used 40 CPU instances (n1-highcpu-32) from Google Cloud Platform (GCP). For SUMO and Metaworld, we used 10 systems equipped with following devices. CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz Memory: 32 GB GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2)
Software Dependencies	Yes	GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2). [...] PPO implementation in Bipedal Walker from Barhate (2021) [Pytorch implicitly mentioned].
Experiment Setup	Yes	Hyperparameter Bipedal Walker Metaworld Number of Views (N) 5 3 Policy PPO PPO PPO batch size 4,000 6,400 Rollout buffer size 4,000 100,000 # Epochs per update 80 8 Gamma 0.99 0.99 GAE lambda 0.95 Clip range ( ϵ ) 0.2 0.2 Entropy coefficient 0.005 0.0 Value function coefficient 0.5 Gradient clip 0.5 Target KL 0.12 Learning rate (actor) 3e-4 3e-4 Learning rate (critic) 1e-3 3e-4 Learning rate (representation) 3e-4 3e-4 Observation buffer size 100,000 # Unsupervised learning steps 400 Subsequence length (H) 8 4 Size of the latent state (representation) 24 128