Prioritizing Samples in Reinforcement Learning with Reducible Loss

Authors: Shivakanth Sujit, Somjit Nath, Pedro Braga, Samira Ebrahimi Kahou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that across multiple domains our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in prioritized experience replay. The code to reproduce our experiments can be found here. We demonstrate the performance of our approach empirically on the Deep Mind Control Suite [Tassa et al., 2018], Open AI Gym, Min Atar [Young and Tian, 2019] and Arcade Learning Environment [Bellemare et al., 2013] benchmarks.
Researcher Affiliation Academia Shivakanth Sujit Mila, Quebec AI Institute, ÉTS Montréal shivakanth.sujit@gmail.com Somjit Nath Mila, Quebec AI Institute, ÉTS Montréal somjitnath@gmail.com Pedro H.M. Braga Mila, Quebec AI Institute, ÉTS Montréal, Universidade Federal de Pernambuco pedromagalhaes.hb@gmail.com Samira Ebrahimi Kahou Mila, Quebec AI Institute, ÉTS Montréal, CIFAR AI Chair samira.ebrahimi.kahou@gmail.com
Pseudocode Yes Algorithm 1 Computing Re Lo for prioritization
Open Source Code Yes The code to reproduce our experiments can be found here.
Open Datasets Yes We demonstrate the performance of our approach empirically on the Deep Mind Control Suite [Tassa et al., 2018], Open AI Gym, Min Atar [Young and Tian, 2019] and Arcade Learning Environment [Bellemare et al., 2013] benchmarks.
Dataset Splits Yes We also compared the validation TD errors of each method after training in Tables 2, 4, and 5. This was done by collecting 10^4 frames from the environment and computing the mean TD errors of these transitions.
Hardware Specification Yes Table 6: Hyper-Parameters of all experiments ... Hardware CPU: 6 Intel Gold 6148 Skylake GPU: 1 NVidia V100 RAM: 32 GB
Software Dependencies Yes Table 6: Hyper-Parameters of all experiments ... Software Pytorch: 1.10.0 Python: 3.8
Experiment Setup Yes Table 6: Hyper-Parameters of all experiments ... Frames = 2 10^6 ... Remaining hyper-parameters same as Hessel et al. [2017]. We used the default hyperparameters that were given by Schaul et al. [2016], i.e α = 0.5, β = 0.4 for all benchmarks for PER and Re Lo.