LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

Authors: Daejin Jo, Sungwoong Kim, Daniel Nam, Taehwan Kwon, Seungeun Rho, Jongmin Kim, Donghoon Lee

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in Mini Grid and DMLab environments.
Researcher Affiliation Industry Kakao Brain Seongnam, South Korea {daejin.jo, swkim, dwtnam, taehwan.kwon, seungeun.rho, jmkim, dhlee} @kakaobrain.com
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Official codes to run the algorithm and the experiments will be available2. 2https://github.com/kakaobrain/leco
Open Datasets Yes Mini Grid [7] and DMLab [3] are cited as environments used for experiments, which are standard, publicly available environments in reinforcement learning.
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits as it involves reinforcement learning environments rather than static datasets.
Hardware Specification Yes In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours.
Software Dependencies No The paper mentions using an 'IMPALA-based agent' and promises code availability, but it does not explicitly list specific software dependencies with their version numbers in the provided text.
Experiment Setup Yes In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours. The unroll length was T = 96 for all tasks and same LSTM-based policy network architecture was used for LECO and all other baselines. Details on hyperparameters, model architectures, and training settings are provided in Appendix B.