Representation-Driven Reinforcement Learning

Authors: Ofir Nabati, Guy Tennenholtz, Shie Mannor

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach through its application to both evolutionary and policy gradient-based approaches demonstrating significantly improved performance compared to traditional methods. Empirical experiments on the Mu Jo Co (Todorov et al., 2012) and Min Atar (Young & Tian, 2019) show the benefits of our approach, particularly in sparse reward settings.
Researcher Affiliation Collaboration 1Department of Electrical-Engineering, Technion Institute of Technology, Israel 2Technion (currently at Google Research) 3Nvidia Research.
Pseudocode Yes Pseudo code for Rep RL is presented in Algorithm 1. ... Algorithm 2 Representation Driven Evolution Strategy ... Algorithm 3 Representation Driven Policy Gradient ... Algorithm 4 Random Search / Evolution Strategy ... Algorithm 5 Representation Driven Evolution Strategy ... Algorithm 6 Representation Driven Policy Gradient
Open Source Code No The paper does not provide a direct link to its own open-source code or explicitly state that its code is being released. It refers to code from another paper: 'For more details, we refer the reader to the code provided in Navon et al. (2023), which was used by us.'
Open Datasets Yes Empirical experiments on the Mu Jo Co (Todorov et al., 2012) and Min Atar (Young & Tian, 2019) show the benefits of our approach, particularly in sparse reward settings.
Dataset Splits No The paper evaluates performance on environments like Mu Jo Co and Min Atar but does not provide specific dataset splits (e.g., percentages or counts) for training, validation, or testing.
Hardware Specification No The paper does not specify any particular hardware components like CPU/GPU models, memory, or specific computing infrastructure used for the experiments.
Software Dependencies No The paper mentions using an 'Adam optimizer' but does not specify version numbers for any software dependencies, libraries, or programming languages.
Experiment Setup Yes The detailed network architecture and hyperparameters utilized in the experiments are provided in Appendix F. ... For example, in Appendix F.1: 'The policy employed in this study is a fully-connected network with 3 layers, featuring the use of the tanh non-linearity operator. The hidden layers dimensions across the network are fixed at 32, followed by a Softmax operation.' Also, '300 rounds were executed, with 100 trajectories sampled at each round utilizing noisy sampling of the current policy, with a zero-mean Gaussian noise and a standard deviation of 0.1. The ES algorithm utilized a step size of 0.1, while the Rep RL algorithm employed a decision set of size 2048 without a discount factor (γ = 1) and λ = 0.1.' Appendix F.2 and F.3 provide similar detailed setup information for other experiments.