Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Information-Theoretic State Space Model for Multi-View Reinforcement Learning
Authors: Hyeongjoo Hwang, Seokin Seo, Youngsoo Jang, Sungyoon Kim, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an extensive set of experiments in various control tasks showing that our method is highly effective in aggregating task-relevant information across many views, that scales linearly with the number of views while retaining robustness to arbitrary missing view scenarios. |
| Researcher Affiliation | Collaboration | 1Kim Jaechul Graduate School of AI, KAIST, Daejeon, South Korea 2LG AI Research, Seoul, South Korea 3School of Computing, KAIST, Daejeon, South Korea. |
| Pseudocode | Yes | Algorithm 1 summarizes the overall optimization process. |
| Open Source Code | Yes | The code is available at: https://github.com/gr8joo/F2C |
| Open Datasets | Yes | Bipedal Walker: https://zenodo.org/record/6583263, https://zenodo.org/record/6583291 SUMO: https://zenodo.org/record/7568625#.Y9Etzn ZBy Hs [...] we employed 8 complex robotic arm manipulation tasks in Metaworld (Yu et al., 2020) |
| Dataset Splits | Yes | The dataset collected in Step 1 was split into train (0.8) and validation (0.2) sets |
| Hardware Specification | Yes | For Bipedal Walker, we used 40 CPU instances (n1-highcpu-32) from Google Cloud Platform (GCP). For SUMO and Metaworld, we used 10 systems equipped with following devices. CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz Memory: 32 GB GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2) |
| Software Dependencies | Yes | GPU: Nvidia TITAN V (Driver version: 440.44 & CUDA version: 10.2). [...] PPO implementation in Bipedal Walker from Barhate (2021) [Pytorch implicitly mentioned]. |
| Experiment Setup | Yes | Hyperparameter Bipedal Walker Metaworld Number of Views (N) 5 3 Policy PPO PPO PPO batch size 4,000 6,400 Rollout buffer size 4,000 100,000 # Epochs per update 80 8 Gamma 0.99 0.99 GAE lambda 0.95 Clip range ( ϵ ) 0.2 0.2 Entropy coefficient 0.005 0.0 Value function coefficient 0.5 Gradient clip 0.5 Target KL 0.12 Learning rate (actor) 3e-4 3e-4 Learning rate (critic) 1e-3 3e-4 Learning rate (representation) 3e-4 3e-4 Observation buffer size 100,000 # Unsupervised learning steps 400 Subsequence length (H) 8 4 Size of the latent state (representation) 24 128 |