LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
Authors: Daejin Jo, Sungwoong Kim, Daniel Nam, Taehwan Kwon, Seungeun Rho, Jongmin Kim, Donghoon Lee
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in Mini Grid and DMLab environments. |
| Researcher Affiliation | Industry | Kakao Brain Seongnam, South Korea {daejin.jo, swkim, dwtnam, taehwan.kwon, seungeun.rho, jmkim, dhlee} @kakaobrain.com |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Official codes to run the algorithm and the experiments will be available2. 2https://github.com/kakaobrain/leco |
| Open Datasets | Yes | Mini Grid [7] and DMLab [3] are cited as environments used for experiments, which are standard, publicly available environments in reinforcement learning. |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits as it involves reinforcement learning environments rather than static datasets. |
| Hardware Specification | Yes | In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours. |
| Software Dependencies | No | The paper mentions using an 'IMPALA-based agent' and promises code availability, but it does not explicitly list specific software dependencies with their version numbers in the provided text. |
| Experiment Setup | Yes | In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours. The unroll length was T = 96 for all tasks and same LSTM-based policy network architecture was used for LECO and all other baselines. Details on hyperparameters, model architectures, and training settings are provided in Appendix B. |