Importance Resampling for Off-policy Prediction

Authors: Matthew Schlegel, Wesley Chung, Daniel Graves, Jian Qian, Martha White

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate in several microworlds that IR has improved sample efficiency and lower variance updates, as compared to IS and several variance-reduced IS strategies, including variants of WIS and V-trace which clips IS ratios. We also provide a demonstration showing IR improves over IS for learning a value function from images in a racing car simulator.
Researcher Affiliation Collaboration Matthew Schlegel University of Alberta mkschleg@ualberta.ca Wesley Chung University of Alberta wchung@ualberta.ca Daniel Graves Huawei daniel.graves@huawei.com Jian Qian University of Alberta jq1@ulberta.ca Martha White University of Alberta whitem@ulberta.ca
Pseudocode No The paper describes the steps of the Importance Resampling algorithm in text but does not provide a formal pseudocode block or algorithm listing.
Open Source Code Yes Experimental code for every domain except Torcs can be found at https://mkschleg.github.io/Resampling.jl
Open Datasets No The paper describes experiments in custom microworlds (Markov chain, Four Rooms) and a racing car simulator (TORCs). Data is generated by these environments. No specific public datasets are mentioned with concrete access information (link, DOI, formal citation).
Dataset Splits No The paper does not explicitly provide specific training, validation, or test dataset split percentages or sample counts. It describes metrics like AVE and ARE computed on interactions, but no formal splits.
Hardware Specification No The paper mentions that "Compute Canada (www.computecanada.ca)" provided "computing resources" but does not specify any particular CPU or GPU models, memory, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions software components like "RMSProp learning rate selection" and "tilecoded features" but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup Yes We consider two variants of IR: with and without bias correction. ... resample a mini-batch of size k on each step t from the buffer of size n, proportionally to i in the buffer. ... The representation is a tile coded feature vector with 64 tilings and 8 tiles. ... a convolutional neural network is used for TORCs, with an architecture previously defined for self-driving cars [Bojarski et al., 2016].