Flow-based Intrinsic Curiosity Module

Authors: Hsuan-Kung Yang, Po-Han Chiang, Min-Fong Hong, Chun-Yi Lee

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method and compare it with a number of existing methods on multiple benchmark environments, including Atari games, Super Mario Bros., and Vi ZDoom. We demonstrate that FICM is favorable to tasks or environments featuring moving objects, which allow FICM to utilize the motion features between consecutive observations. We further ablatively analyze the encoding efficiency of FICM, and discuss its applicable domains comprehensively.
Researcher Affiliation Academia Hsuan-Kung Yang , Po-Han Chiang , Min-Fong Hong and Chun-Yi Lee Elsa Lab, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan {hellochick, ymmoy999, romulus, cylee}@gapp.nthu.edu.tw
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes See here for our codes and demo videos.
Open Datasets Yes We validate the above properties of FICM in a variety of benchmark environments, including Atari 2600 [Bellemare et al., 2013], Super Mario Bros., and Vi ZDoom [Wydmuch et al., 2018].
Dataset Splits No The paper uses benchmark environments and plots evaluation curves, but does not explicitly state specific train/validation/test dataset splits (e.g., percentages or sample counts) for its experiments.
Hardware Specification No The paper mentions 'the donation of the GPUs from NVIDIA Corporation and NVIDIA AI Technology Center' but does not specify the exact models or specifications of the GPUs or other hardware used for the experiments.
Software Dependencies No The paper states that 'all of the experiments are conducted with proximal policy optimization (PPO) [Schulman et al., 2017]' and that DRL agents are 'based on Asynchronous Advantage Actor-Critic (A3C) [Mnih et al., 2016]', but it does not provide specific version numbers for these or any other software libraries or dependencies.
Experiment Setup No The paper mentions using PPO and A3C, and discussing input types (RGB vs. gray-scale, stacked vs. non-stacked frames), but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed system-level training configurations used in their experiments.