Difference of Convex Functions Programming for Reinforcement Learning
Authors: Bilal Piot, Matthieu Geist, Olivier Pietquin
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally in Sec. 5, we conduct a generic experiment that compares a naive implementation of our approach to API and AVI methods, showing that it is competitive. |
| Researcher Affiliation | Academia | Bilal Piot1,2, Matthieu Geist1, Olivier Pietquin2,3 1Ma LIS research group (SUPELEC) UMI 2958 (Georgia Tech-CNRS), France 2LIFL (UMR 8022 CNRS/Lille 1) Seque L team, Lille, France 3 University Lille 1 IUF (Institut Universitaire de France), France bilal.piot@lifl.fr, matthieu.geist@supelec.fr, olivier.pietquin@univ-lille1.fr |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | In this experiment, we construct 50 Garnets {Gp}1 p 50 as explained before. For each Garnet Gp, we build 10 data sets {Dp,q}1 q 10 composed of N sampled transitions (si, ai, s i)N i=1 drawn uniformly and independently. The paper describes generating custom datasets based on a reference, but does not provide concrete access (link, DOI, repository) to these generated datasets or a pre-existing public dataset used. |
| Dataset Splits | No | The paper describes building datasets composed of sampled transitions but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a "sub-gradient descent" method and comparing against "LSPI" and "Fitted-Q" but does not list any specific software libraries or packages with version numbers used for implementation. |
| Experiment Setup | Yes | The initialisation of DCA is θ0 = 0 and the intermediary optimization convex problems are solved by a sub-gradient descent [18]. |