Representation-Driven Reinforcement Learning
Authors: Ofir Nabati, Guy Tennenholtz, Shie Mannor
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach through its application to both evolutionary and policy gradient-based approaches demonstrating significantly improved performance compared to traditional methods. Empirical experiments on the Mu Jo Co (Todorov et al., 2012) and Min Atar (Young & Tian, 2019) show the benefits of our approach, particularly in sparse reward settings. |
| Researcher Affiliation | Collaboration | 1Department of Electrical-Engineering, Technion Institute of Technology, Israel 2Technion (currently at Google Research) 3Nvidia Research. |
| Pseudocode | Yes | Pseudo code for Rep RL is presented in Algorithm 1. ... Algorithm 2 Representation Driven Evolution Strategy ... Algorithm 3 Representation Driven Policy Gradient ... Algorithm 4 Random Search / Evolution Strategy ... Algorithm 5 Representation Driven Evolution Strategy ... Algorithm 6 Representation Driven Policy Gradient |
| Open Source Code | No | The paper does not provide a direct link to its own open-source code or explicitly state that its code is being released. It refers to code from another paper: 'For more details, we refer the reader to the code provided in Navon et al. (2023), which was used by us.' |
| Open Datasets | Yes | Empirical experiments on the Mu Jo Co (Todorov et al., 2012) and Min Atar (Young & Tian, 2019) show the benefits of our approach, particularly in sparse reward settings. |
| Dataset Splits | No | The paper evaluates performance on environments like Mu Jo Co and Min Atar but does not provide specific dataset splits (e.g., percentages or counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify any particular hardware components like CPU/GPU models, memory, or specific computing infrastructure used for the experiments. |
| Software Dependencies | No | The paper mentions using an 'Adam optimizer' but does not specify version numbers for any software dependencies, libraries, or programming languages. |
| Experiment Setup | Yes | The detailed network architecture and hyperparameters utilized in the experiments are provided in Appendix F. ... For example, in Appendix F.1: 'The policy employed in this study is a fully-connected network with 3 layers, featuring the use of the tanh non-linearity operator. The hidden layers dimensions across the network are fixed at 32, followed by a Softmax operation.' Also, '300 rounds were executed, with 100 trajectories sampled at each round utilizing noisy sampling of the current policy, with a zero-mean Gaussian noise and a standard deviation of 0.1. The ES algorithm utilized a step size of 0.1, while the Rep RL algorithm employed a decision set of size 2048 without a discount factor (γ = 1) and λ = 0.1.' Appendix F.2 and F.3 provide similar detailed setup information for other experiments. |