Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Laplacian in RL: Learning Representations with Efficient Approximations
Authors: Yifan Wu, George Tucker, Ofir Nachum
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. We systematically evaluate our approach and empirically show that it generalizes beyond the tabular, finite-state setting. |
| Researcher Affiliation | Collaboration | Yifan Wu Carnegie Mellon University EMAIL George Tucker Google Brain EMAIL Ofir Nachum Google Brain EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating data through interaction with environments (e.g., 'We generate a dataset of experience by randomly sampling n transitions using a uniformly random policy with random initial state' in the Four Room gridworld and using Mujoco environments). It does not provide access information (link, DOI, citation) to a pre-existing publicly available dataset. |
| Dataset Splits | No | The paper does not explicitly mention or specify any validation dataset splits. It discusses training and testing performance within simulated environments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or cloud computing specifications). |
| Software Dependencies | No | The paper mentions software like 'DQN', 'DDPG', 'TensorFlow', and 'Mujoco' but does not specify their version numbers, which are necessary for reproducibility. |
| Experiment Setup | Yes | We use β = d/20, batch size 32, Adam optimizer with learning rate 0.001 and total training steps 100, 000. For representation learning we use d = 20. In the definition of D we use the discounted multi-step transitions (9) with λ = 0.9. For the approximate graph drawing objective (6) we use β = 5.0 and δjk = 0.05 (instead of 1) if j = k otherwise 0 to control the scale of L2 distances. We pretrain the representations for 30000 steps...by Adam with batch size 128 and learning rate 0.001. |