reproducibilityindex.ai

Difference of Convex Functions Programming for Reinforcement Learning

Authors: Bilal Piot, Matthieu Geist, Olivier Pietquin

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally in Sec. 5, we conduct a generic experiment that compares a naive implementation of our approach to API and AVI methods, showing that it is competitive.
Researcher Affiliation	Academia	Bilal Piot1,2, Matthieu Geist1, Olivier Pietquin2,3 1Ma LIS research group (SUPELEC) UMI 2958 (Georgia Tech-CNRS), France 2LIFL (UMR 8022 CNRS/Lille 1) Seque L team, Lille, France 3 University Lille 1 IUF (Institut Universitaire de France), France bilal.piot@lifl.fr, matthieu.geist@supelec.fr, olivier.pietquin@univ-lille1.fr
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	No	In this experiment, we construct 50 Garnets {Gp}1 p 50 as explained before. For each Garnet Gp, we build 10 data sets {Dp,q}1 q 10 composed of N sampled transitions (si, ai, s i)N i=1 drawn uniformly and independently. The paper describes generating custom datasets based on a reference, but does not provide concrete access (link, DOI, repository) to these generated datasets or a pre-existing public dataset used.
Dataset Splits	No	The paper describes building datasets composed of sampled transitions but does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using a "sub-gradient descent" method and comparing against "LSPI" and "Fitted-Q" but does not list any specific software libraries or packages with version numbers used for implementation.
Experiment Setup	Yes	The initialisation of DCA is θ0 = 0 and the intermediary optimization convex problems are solved by a sub-gradient descent [18].