Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning
Authors: Jinxin Liu, Donglin Wang, Qiangxing Tian, Zhengyu Chen7558-7566
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Extensive experiments are conducted to evaluate our proposed GPIM method... We show the results in Figure 6 by plotting the normalized distance to goals as a function of the number of actor s steps... |
| Researcher Affiliation | Academia | Jinxin Liu1,2,3, Donglin Wang2,3 , Qiangxing Tian1,2,3, Zhengyu Chen1,2,3 1Zhejiang University. 2Westlake University. 3Westlake Institute for Advanced Study. EMAIL |
| Pseudocode | Yes | Algorithm 1: Learning process of our proposed GPIM |
| Open Source Code | No | The paper provides a link to a website (https://sites.google.com/view/gpim) that contains videos, but it does not explicitly state that the source code for the methodology is available there or elsewhere. |
| Open Datasets | Yes | We use three mujoco tasks (swimmer, half cheetah, and fetch) taken from Open AI GYM (Brockman et al. 2016) |
| Dataset Splits | No | We use the normalized distance to goals as the evaluation metric, where we generate 50 goals (tasks) as validation. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions using SAC, Mu Jo Co Physics Engine, and Media Pipe, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | No | The paper states it uses SAC for optimization, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed system-level training settings. |