Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning

Authors: Kibeom Kim, Min Whoo Lee, Yoonsung Kim, JeHwan Ryu, Minsu Lee, Byoung-Tak Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed methods on visual navigation and robot arm manipulation tasks with multi-target environments and show that GDAN outperforms the state-of-the-art methods in terms of task success ratio, sample efficiency, and generalization. Additionally, qualitative analyses demonstrate that our proposed method can help the agent become aware of and focus on the given instruction clearly, promoting goal-directed behavior.
Researcher Affiliation Collaboration Kibeom Kim1,2, Min Whoo Lee1, Yoonsung Kim1, Je-Hwan Ryu1, Minsu Lee1,3 , Byoung-Tak Zhang1,3 1Seoul National University, 2Surromind, 3AIIS {kbkim, mwlee, yskim, jhryu, mslee, btzhang}@bi.snu.ac.kr
Pseudocode No The paper refers to 'Appendix D for algorithm details' but does not include pseudocode or a clearly labeled algorithm block within the main text provided.
Open Source Code Yes Code available at https://github.com/kibeom Kim/GACE-GDAN
Open Datasets No The paper mentions developing and making publicly available 'visual navigation and robot arm manipulation tasks as benchmarks' along with their implementation, which defines the environment for data generation. However, it does not provide a concrete link, DOI, or citation for a pre-collected, static 'dataset'.
Dataset Splits No The paper mentions training, seen, and unseen environments for generalization evaluation, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for any dataset.
Hardware Specification No The paper does not specify any particular hardware components such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions software components like A3C, Pixel-SAC, MuJoCo, ViZDoom, and LSTM, but it does not provide specific version numbers for any of them.
Experiment Setup Yes The rewards are set as rsuccess = 10, rnongoal = 1, rtimeout = 0.1, rstep = 0.01. Full details of the environment are provided in the Appendix C. ... We complete the training procedure by optimizing the overall loss Ltotal as the weighted sum of the two losses in Eq. 9. We focus on improving the policy for performing the main task and assign weight η to LGACE for performing goal-aware representation learning for the feature extractor σ( ). ... All experiments are repeated five times.