reproducibilityindex.ai

POLITEX: Regret Bounds for Policy Iteration using Expert Prediction

Authors: Yasin Abbasi-Yadkori, Peter Bartlett, Kush Bhatia, Nevena Lazic, Csaba Szepesvari, Gellert Weisz

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a queuing problem conﬁrm that POLITEX is competitive with some of its alternatives, while preliminary results on Ms Pacman (one of the standard Atari benchmark problems) conﬁrm the viability of POLITEX beyond linear function approximation.
Researcher Affiliation	Collaboration	1Adobe Research 2UC Berkeley 3Google Brain 4Deep Mind. Correspondence to: Nevena Lazic <nevena@google.com>.
Pseudocode	Yes	Algorithm 1 POLITEX: POLicy ITeration using EXperts
Open Source Code	No	The paper does not provide any explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We ﬁrst study the performance of POLITEX with linear function approximation on the 4-dimensional and 8-dimensional queueing network problems described in de Farias & Van Roy (2003) (Figures 6 and 7). ... We compare a version of POLITEX to DQN (Mnih et al., 2013) on a standard Atari environment running Ms Pacman.
Dataset Splits	No	The paper describes the experimental setup and duration (e.g., '2000 phases of length τ = E') but does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the context of continuous reinforcement learning.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments. It mentions 'standard Atari environment' which implies a simulator, but no hardware specifics.
Software Dependencies	No	The paper mentions algorithms like LSPE, TD(0), SOLO FTRL, and DQN, but it does not specify software packages or libraries with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x, scikit-learn 0.x) used for implementation.
Experiment Setup	Yes	For all policies, we bias the covariance of the value functions with β = 0.1. For LSPI and POLITEX, experiment with η = k/T, for k ∈ {1, 5, 10, 20, 100, 500, 1000, 2000, 4000}; the value k = 1 was best. ... We initialize to empty queues and run policies for E = 2000 phases of length τ = E.