Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Action Representations for Reinforcement Learning
Authors: Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip Thomas
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Empirical Analysis To understand the internal working of our proposed algorithm, we present visualizations of the learned action representations on the maze domain. |
| Researcher Affiliation | Collaboration | 1University of Massachusetts, Amherst, USA. 2Adobe Research, San Jose, USA. |
| Pseudocode | Yes | Algorithm 1: Policy Gradient with Representations for Action (PG-RA) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code release. |
| Open Datasets | No | For both of these applications, an existing log of user s click stream data was used to create an n-gram based MDP model for user behavior (Shani et al., 2005). In the tutorial recommendation task, user activity for a three month period was observed. Sequences of user interaction were aggregated to obtain over 29 million clicks. Similarly, for a month long duration, sequential usage patterns of the tools in the multi-media editing software were collected to obtain a total of over 1.75 billion user clicks. |
| Dataset Splits | No | The paper describes the data sources but does not provide specific details on training, validation, or test dataset splits, percentages, or methodology for splitting. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments (e.g., CPU, GPU models, or memory). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | For detailed discussion on parameterization of the function approximators and hyper-parameter search, see Appendix D. |