The Laplacian in RL: Learning Representations with Efficient Approximations

Authors: Yifan Wu, George Tucker, Ofir Nachum

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting.
Researcher Affiliation Collaboration Yifan Wu Carnegie Mellon University yw4@cs.cmu.edu George Tucker Google Brain gjt@google.com Ofir Nachum Google Brain ofirnachum@google.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets No The paper describes generating data through interaction with environments (e.g., 'We generate a dataset of experience by randomly sampling n transitions using a uniformly random policy with random initial state' in the Four Room gridworld and using Mujoco environments). It does not provide access information (link, DOI, citation) to a pre-existing publicly available dataset.
Dataset Splits No The paper does not explicitly mention or specify any validation dataset splits. It discusses training and testing performance within simulated environments.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or cloud computing specifications).
Software Dependencies No The paper mentions software like 'DQN', 'DDPG', 'TensorFlow', and 'Mujoco' but does not specify their version numbers, which are necessary for reproducibility.
Experiment Setup Yes We use β = d/20, batch size 32, Adam optimizer with learning rate 0.001 and total training steps 100, 000. For representation learning we use d = 20. In the definition of D we use the discounted multi-step transitions (9) with λ = 0.9. For the approximate graph drawing objective (6) we use β = 5.0 and δjk = 0.05 (instead of 1) if j = k otherwise 0 to control the scale of L2 distances. We pretrain the representations for 30000 steps...by Adam with batch size 128 and learning rate 0.001.