The Laplacian in RL: Learning Representations with Efficient Approximations
Authors: Yifan Wu, George Tucker, Ofir Nachum
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. |
| Researcher Affiliation | Collaboration | Yifan Wu Carnegie Mellon University yw4@cs.cmu.edu George Tucker Google Brain gjt@google.com Ofir Nachum Google Brain ofirnachum@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating data through interaction with environments (e.g., 'We generate a dataset of experience by randomly sampling n transitions using a uniformly random policy with random initial state' in the Four Room gridworld and using Mujoco environments). It does not provide access information (link, DOI, citation) to a pre-existing publicly available dataset. |
| Dataset Splits | No | The paper does not explicitly mention or specify any validation dataset splits. It discusses training and testing performance within simulated environments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or cloud computing specifications). |
| Software Dependencies | No | The paper mentions software like 'DQN', 'DDPG', 'TensorFlow', and 'Mujoco' but does not specify their version numbers, which are necessary for reproducibility. |
| Experiment Setup | Yes | We use β = d/20, batch size 32, Adam optimizer with learning rate 0.001 and total training steps 100, 000. For representation learning we use d = 20. In the definition of D we use the discounted multi-step transitions (9) with λ = 0.9. For the approximate graph drawing objective (6) we use β = 5.0 and δjk = 0.05 (instead of 1) if j = k otherwise 0 to control the scale of L2 distances. We pretrain the representations for 30000 steps...by Adam with batch size 128 and learning rate 0.001. |