Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Laplacian in RL: Learning Representations with Efficient Approximations

Authors: Yifan Wu, George Tucker, Ofir Nachum

ICLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting.
Researcher Affiliation Collaboration Yifan Wu Carnegie Mellon University EMAIL George Tucker Google Brain EMAIL Ofir Nachum Google Brain EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets No The paper describes generating data through interaction with environments (e.g., 'We generate a dataset of experience by randomly sampling n transitions using a uniformly random policy with random initial state' in the Four Room gridworld and using Mujoco environments). It does not provide access information (link, DOI, citation) to a pre-existing publicly available dataset.
Dataset Splits No The paper does not explicitly mention or specify any validation dataset splits. It discusses training and testing performance within simulated environments.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or cloud computing specifications).
Software Dependencies No The paper mentions software like 'DQN', 'DDPG', 'TensorFlow', and 'Mujoco' but does not specify their version numbers, which are necessary for reproducibility.
Experiment Setup Yes We use β = d/20, batch size 32, Adam optimizer with learning rate 0.001 and total training steps 100, 000. For representation learning we use d = 20. In the definition of D we use the discounted multi-step transitions (9) with λ = 0.9. For the approximate graph drawing objective (6) we use β = 5.0 and δjk = 0.05 (instead of 1) if j = k otherwise 0 to control the scale of L2 distances. We pretrain the representations for 30000 steps...by Adam with batch size 128 and learning rate 0.001.