Learning Action Representations for Reinforcement Learning
Authors: Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip Thomas
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Empirical Analysis To understand the internal working of our proposed algorithm, we present visualizations of the learned action representations on the maze domain. |
| Researcher Affiliation | Collaboration | 1University of Massachusetts, Amherst, USA. 2Adobe Research, San Jose, USA. |
| Pseudocode | Yes | Algorithm 1: Policy Gradient with Representations for Action (PG-RA) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code release. |
| Open Datasets | No | For both of these applications, an existing log of user s click stream data was used to create an n-gram based MDP model for user behavior (Shani et al., 2005). In the tutorial recommendation task, user activity for a three month period was observed. Sequences of user interaction were aggregated to obtain over 29 million clicks. Similarly, for a month long duration, sequential usage patterns of the tools in the multi-media editing software were collected to obtain a total of over 1.75 billion user clicks. |
| Dataset Splits | No | The paper describes the data sources but does not provide specific details on training, validation, or test dataset splits, percentages, or methodology for splitting. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments (e.g., CPU, GPU models, or memory). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | For detailed discussion on parameterization of the function approximators and hyper-parameter search, see Appendix D. |