Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning
Authors: Kibeom Kim, Min Whoo Lee, Yoonsung Kim, JeHwan Ryu, Minsu Lee, Byoung-Tak Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed methods on visual navigation and robot arm manipulation tasks with multi-target environments and show that GDAN outperforms the state-of-the-art methods in terms of task success ratio, sample efficiency, and generalization. Additionally, qualitative analyses demonstrate that our proposed method can help the agent become aware of and focus on the given instruction clearly, promoting goal-directed behavior. |
| Researcher Affiliation | Collaboration | Kibeom Kim1,2, Min Whoo Lee1, Yoonsung Kim1, Je-Hwan Ryu1, Minsu Lee1,3 , Byoung-Tak Zhang1,3 1Seoul National University, 2Surromind, 3AIIS {kbkim, mwlee, yskim, jhryu, mslee, btzhang}@bi.snu.ac.kr |
| Pseudocode | No | The paper refers to 'Appendix D for algorithm details' but does not include pseudocode or a clearly labeled algorithm block within the main text provided. |
| Open Source Code | Yes | Code available at https://github.com/kibeom Kim/GACE-GDAN |
| Open Datasets | No | The paper mentions developing and making publicly available 'visual navigation and robot arm manipulation tasks as benchmarks' along with their implementation, which defines the environment for data generation. However, it does not provide a concrete link, DOI, or citation for a pre-collected, static 'dataset'. |
| Dataset Splits | No | The paper mentions training, seen, and unseen environments for generalization evaluation, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for any dataset. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like A3C, Pixel-SAC, MuJoCo, ViZDoom, and LSTM, but it does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The rewards are set as rsuccess = 10, rnongoal = 1, rtimeout = 0.1, rstep = 0.01. Full details of the environment are provided in the Appendix C. ... We complete the training procedure by optimizing the overall loss Ltotal as the weighted sum of the two losses in Eq. 9. We focus on improving the policy for performing the main task and assign weight η to LGACE for performing goal-aware representation learning for the feature extractor σ( ). ... All experiments are repeated five times. |