reproducibilityindex.ai

Direct Policy Iteration with Demonstrations

Authors: Jessica Chemali, Alessandro Lazaric

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.
Researcher Affiliation	Academia	Jessica Chemali Alessandro Lazaric Machine Learning Department Seque L team Carnegie Mellon University INRIA Lille
Pseudocode	Yes	Algorithm 1 Direct Policy Iteration with Demonstrations
Open Source Code	Yes	The implementation of DPID is available at https://www.dropbox.com/s/jj4g9ndonol4aoy/dpid.zip?dl=0
Open Datasets	No	The paper uses the Garnet framework to generate random finite MDPs and describes the Vehicle Brake Control domain, which are simulated environments, but does not provide access to specific datasets used for experiments.
Dataset Splits	No	The paper does not explicitly provide details on how the data was split into training, validation, and test sets for the experiments.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies	No	The paper mentions software like CVX and LIBSVM but does not provide specific version numbers for these or any other ancillary software dependencies used in the experiments.
Experiment Setup	Yes	We use Ns = 15, Na = 3, Nb(s) [3, 6] and we estimate Erralg over 100 independent runs, while error bars are computed as 95% Gaussian conﬁdence intervals. Fixed NE = 15 optimal expert demonstrations and increasing NRL by 50 at each iteration, starting with NRL = 50 NE (NE = 0 for DPI). In Fig.1B and C, we replace the optimal expert with a suboptimal one by sampling random non-optimal actions 25% and 50% of the time respectively. We now introduce an approximate representation such that for every state, we construct a binary feature vector of length d = 6 < Ns. The number of ones in the representation is set to l = 3 and their locations are chosen randomly as in [Bhatnagar et al., 2009].