reproducibilityindex.ai

The Laplacian in RL: Learning Representations with Efficient Approximations

Authors: Yifan Wu, George Tucker, Ofir Nachum

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, ﬁnite-state setting.
Researcher Affiliation	Collaboration	Yifan Wu Carnegie Mellon University yw4@cs.cmu.edu George Tucker Google Brain gjt@google.com Oﬁr Nachum Google Brain ofirnachum@google.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets	No	The paper describes generating data through interaction with environments (e.g., 'We generate a dataset of experience by randomly sampling n transitions using a uniformly random policy with random initial state' in the Four Room gridworld and using Mujoco environments). It does not provide access information (link, DOI, citation) to a pre-existing publicly available dataset.
Dataset Splits	No	The paper does not explicitly mention or specify any validation dataset splits. It discusses training and testing performance within simulated environments.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or cloud computing specifications).
Software Dependencies	No	The paper mentions software like 'DQN', 'DDPG', 'TensorFlow', and 'Mujoco' but does not specify their version numbers, which are necessary for reproducibility.
Experiment Setup	Yes	We use β = d/20, batch size 32, Adam optimizer with learning rate 0.001 and total training steps 100, 000. For representation learning we use d = 20. In the deﬁnition of D we use the discounted multi-step transitions (9) with λ = 0.9. For the approximate graph drawing objective (6) we use β = 5.0 and δjk = 0.05 (instead of 1) if j = k otherwise 0 to control the scale of L2 distances. We pretrain the representations for 30000 steps...by Adam with batch size 128 and learning rate 0.001.