reproducibilityindex.ai

Inverse Reinforcement Learning through Policy Gradient Minimization

Authors: Matteo Pirotta, Marcello Restelli

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an empirical evaluation of the proposed approach on a multidimensional version of the Linear-Quadratic Regulator (LQR) both in the case where the parameters of the expert s policy are known and in the (more realistic) case where the parameters of the expert s policy need to be inferred from the expert s demonstrations. Finally, the algorithm is compared against the state-of-the-art on the mountain car domain, where the expert s policy is unknown.
Researcher Affiliation	Academia	Matteo Pirotta and Marcello Restelli Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano Piazza Leonardo da Vinci, 32 I-20133, Milan, Italy {matteo.pirotta, marcello.restelli}@polimi.it
Pseudocode	No	The paper describes algorithms mathematically and in prose but does not include a distinct pseudocode block or a section explicitly labeled 'Algorithm'.
Open Source Code	No	The paper does not contain any statement about making its source code publicly available, nor does it provide any links to a code repository.
Open Datasets	Yes	This section is devoted to the empirical analysis of the proposed algorithms. The ﬁrst domain, a linear quadratic regulator, is used to illustrate the main characteristics of the proposed approach, while the mountain car domain is used to compare it against the most related approaches. and (Sutton et al. 1999) and (Pirotta, Parisi, and Restelli 2015)
Dataset Splits	No	The paper mentions generating "5 different datasets" and using various "numbers of samples" (e.g., "10, 100, and 1,000 trajectories") but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts for each split, or named predefined splits) that would allow reproduction of the data partitioning.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper mentions 'NLopt library' (http://ab-initio.mit.edu/nlopt) and algorithms like 'REINFORCE', 'GPOMDP', and 'LSPI', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We consider three different mean parametrizations: linear in the state (i.e., the optimal one), with radial basis functions, and polynomial of degree 3. and We have imposed a maximum number of function evaluations to 500 for the convex optimization algorithm. and She selects a random action with probability 0.1. and We deﬁne the expert s policy as a Gibbs policy with linear approximation of the Q-function, a ﬁrst degree polynomial over the state space is replicated for each action. and evenly-spaced hand-tuned 7 7 RBFs.