Difference of Convex Functions Programming for Reinforcement Learning

Authors: Bilal Piot, Matthieu Geist, Olivier Pietquin

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally in Sec. 5, we conduct a generic experiment that compares a naive implementation of our approach to API and AVI methods, showing that it is competitive.
Researcher Affiliation Academia Bilal Piot1,2, Matthieu Geist1, Olivier Pietquin2,3 1Ma LIS research group (SUPELEC) UMI 2958 (Georgia Tech-CNRS), France 2LIFL (UMR 8022 CNRS/Lille 1) Seque L team, Lille, France 3 University Lille 1 IUF (Institut Universitaire de France), France bilal.piot@lifl.fr, matthieu.geist@supelec.fr, olivier.pietquin@univ-lille1.fr
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No In this experiment, we construct 50 Garnets {Gp}1 p 50 as explained before. For each Garnet Gp, we build 10 data sets {Dp,q}1 q 10 composed of N sampled transitions (si, ai, s i)N i=1 drawn uniformly and independently. The paper describes generating custom datasets based on a reference, but does not provide concrete access (link, DOI, repository) to these generated datasets or a pre-existing public dataset used.
Dataset Splits No The paper describes building datasets composed of sampled transitions but does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using a "sub-gradient descent" method and comparing against "LSPI" and "Fitted-Q" but does not list any specific software libraries or packages with version numbers used for implementation.
Experiment Setup Yes The initialisation of DCA is θ0 = 0 and the intermediary optimization convex problems are solved by a sub-gradient descent [18].