reproducibilityindex.ai

LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

Authors: Daejin Jo, Sungwoong Kim, Daniel Nam, Taehwan Kwon, Seungeun Rho, Jongmin Kim, Donghoon Lee

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in Mini Grid and DMLab environments.
Researcher Affiliation	Industry	Kakao Brain Seongnam, South Korea {daejin.jo, swkim, dwtnam, taehwan.kwon, seungeun.rho, jmkim, dhlee} @kakaobrain.com
Pseudocode	No	The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Official codes to run the algorithm and the experiments will be available2. 2https://github.com/kakaobrain/leco
Open Datasets	Yes	Mini Grid [7] and DMLab [3] are cited as environments used for experiments, which are standard, publicly available environments in reinforcement learning.
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test dataset splits as it involves reinforcement learning environments rather than static datasets.
Hardware Specification	Yes	In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours.
Software Dependencies	No	The paper mentions using an 'IMPALA-based agent' and promises code availability, but it does not explicitly list specific software dependencies with their version numbers in the provided text.
Experiment Setup	Yes	In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours. The unroll length was T = 96 for all tasks and same LSTM-based policy network architecture was used for LECO and all other baselines. Details on hyperparameters, model architectures, and training settings are provided in Appendix B.