reproducibilityindex.ai

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Authors: Shivakanth Sujit, Somjit Nath, Pedro Braga, Samira Ebrahimi Kahou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that across multiple domains our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in prioritized experience replay. The code to reproduce our experiments can be found here. We demonstrate the performance of our approach empirically on the Deep Mind Control Suite [Tassa et al., 2018], Open AI Gym, Min Atar [Young and Tian, 2019] and Arcade Learning Environment [Bellemare et al., 2013] benchmarks.
Researcher Affiliation	Academia	Shivakanth Sujit Mila, Quebec AI Institute, ÉTS Montréal shivakanth.sujit@gmail.com Somjit Nath Mila, Quebec AI Institute, ÉTS Montréal somjitnath@gmail.com Pedro H.M. Braga Mila, Quebec AI Institute, ÉTS Montréal, Universidade Federal de Pernambuco pedromagalhaes.hb@gmail.com Samira Ebrahimi Kahou Mila, Quebec AI Institute, ÉTS Montréal, CIFAR AI Chair samira.ebrahimi.kahou@gmail.com
Pseudocode	Yes	Algorithm 1 Computing Re Lo for prioritization
Open Source Code	Yes	The code to reproduce our experiments can be found here.
Open Datasets	Yes	We demonstrate the performance of our approach empirically on the Deep Mind Control Suite [Tassa et al., 2018], Open AI Gym, Min Atar [Young and Tian, 2019] and Arcade Learning Environment [Bellemare et al., 2013] benchmarks.
Dataset Splits	Yes	We also compared the validation TD errors of each method after training in Tables 2, 4, and 5. This was done by collecting 10^4 frames from the environment and computing the mean TD errors of these transitions.
Hardware Specification	Yes	Table 6: Hyper-Parameters of all experiments ... Hardware CPU: 6 Intel Gold 6148 Skylake GPU: 1 NVidia V100 RAM: 32 GB
Software Dependencies	Yes	Table 6: Hyper-Parameters of all experiments ... Software Pytorch: 1.10.0 Python: 3.8
Experiment Setup	Yes	Table 6: Hyper-Parameters of all experiments ... Frames = 2 10^6 ... Remaining hyper-parameters same as Hessel et al. [2017]. We used the default hyperparameters that were given by Schaul et al. [2016], i.e α = 0.5, β = 0.4 for all benchmarks for PER and Re Lo.