Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning
Authors: Jinxin Liu, Donglin Wang, Qiangxing Tian, Zhengyu Chen7558-7566
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Extensive experiments are conducted to evaluate our proposed GPIM method... We show the results in Figure 6 by plotting the normalized distance to goals as a function of the number of actor s steps... |
| Researcher Affiliation | Academia | Jinxin Liu1,2,3, Donglin Wang2,3 , Qiangxing Tian1,2,3, Zhengyu Chen1,2,3 1Zhejiang University. 2Westlake University. 3Westlake Institute for Advanced Study. {liujinxin, wangdonglin, tianqiangxing, chenzhengyu}@westlake.edu.cn |
| Pseudocode | Yes | Algorithm 1: Learning process of our proposed GPIM |
| Open Source Code | No | The paper provides a link to a website (https://sites.google.com/view/gpim) that contains videos, but it does not explicitly state that the source code for the methodology is available there or elsewhere. |
| Open Datasets | Yes | We use three mujoco tasks (swimmer, half cheetah, and fetch) taken from Open AI GYM (Brockman et al. 2016) |
| Dataset Splits | No | We use the normalized distance to goals as the evaluation metric, where we generate 50 goals (tasks) as validation. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions using SAC, Mu Jo Co Physics Engine, and Media Pipe, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | No | The paper states it uses SAC for optimization, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed system-level training settings. |