reproducibilityindex.ai

Latent World Models For Intrinsically Motivated Exploration

Authors: Aleksandr Ermolov, Nicu Sebe

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the method on image-based hard exploration environments from the Atari benchmark and report signiﬁcant improvement with respect to prior work.
Researcher Affiliation	Academia	Aleksandr Ermolov, Nicu Sebe Department of Information Engineering and Computer Science (DISI) University of Trento, Italy {aleksandr.ermolov,niculae.sebe}@unitn.it
Pseudocode	No	The paper states 'The algorithm and the conﬁguration are available in the Supplementary.' but does not contain pseudocode or an algorithm block in the main text.
Open Source Code	Yes	The source code of the method and all the experiments is available at https://github.com/htdt/lwm.
Open Datasets	Yes	We train the LWM method on 6 hard exploration Atari [3] environments: Freeway, Frostbite, Venture, Gravitar, Solaris and Montezuma s Revenge.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits with percentages or sample counts. It describes evaluation procedures (e.g., 'average the cumulative reward over 128 different layouts', 'training budget is 50M environment frames') but not explicit splits for reproducibility like '80/10/10 split' or 'X samples for validation'.
Hardware Specification	Yes	One experiment requires 7.5h of a virtual machine with one Nvidia T4 GPU.
Software Dependencies	No	The paper mentions software components like GRU, RNN, CNN, and DQN, but does not specify their version numbers or the versions of any programming languages or libraries used.
Experiment Setup	Yes	We use 1 frame as a state instead of 4; we do not decouple actors and learner... we employ GRU as RNN... the model performs 40 burn-in steps... We use 0.999 momentum to update the running average. We clip the normalized value to range [-10, 10]... we multiply the resulting value with the coefﬁcient β... intrinsic reward scaling β = 0.01 for Freeway and β = 1 for others. The training budget is 50M environment frames, the ﬁnal scores averaged over 128 episodes of an ϵ-greedy agent with ϵ = 0.001, each experiment is performed with 5 different random seeds.