Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
Authors: Daejin Jo, Sungwoong Kim, Daniel Nam, Taehwan Kwon, Seungeun Rho, Jongmin Kim, Donghoon Lee
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in Mini Grid and DMLab environments. |
| Researcher Affiliation | Industry | Kakao Brain Seongnam, South Korea {daejin.jo, swkim, dwtnam, taehwan.kwon, seungeun.rho, jmkim, dhlee} @kakaobrain.com |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Official codes to run the algorithm and the experiments will be available2. 2https://github.com/kakaobrain/leco |
| Open Datasets | Yes | Mini Grid [7] and DMLab [3] are cited as environments used for experiments, which are standard, publicly available environments in reinforcement learning. |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits as it involves reinforcement learning environments rather than static datasets. |
| Hardware Specification | Yes | In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours. |
| Software Dependencies | No | The paper mentions using an 'IMPALA-based agent' and promises code availability, but it does not explicitly list specific software dependencies with their version numbers in the provided text. |
| Experiment Setup | Yes | In Mini Grid, LECO was trained using two A100 GPUs with a batch size of 768 for 18 hours. In DMLab, we used eight V100 GPUs with a batch size of 576 for 8 hours. The unroll length was T = 96 for all tasks and same LSTM-based policy network architecture was used for LECO and all other baselines. Details on hyperparameters, model architectures, and training settings are provided in Appendix B. |