Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning
Authors: Otmane Sakhi, Imad Aouali, Pierre Alquier, Nicolas Chopin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive policy evaluation, selection, and learning experiments highlight the versatility and favorable performance of LS. ... Extensive experiments in Section 5 highlight the favorable performance of LS |
| Researcher Affiliation | Collaboration | Otmane Sakhi Criteo AI Lab, Paris, France o.sakhi@criteo.com Imad Aouali CREST, ENSAE Criteo AI Lab, Paris, France i.aouali@criteo.com Pierre Alquier ESSEC Business School, Singapore alquier@essec.edu Nicolas Chopin CREST, ENSAE nicolas.chopin@ensae.fr |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code can be found at https://github.com/otmhi/offpolicy_ls. |
| Open Datasets | Yes | 11 real multiclass classification datasets are chosen from the UCI ML Repository [8] ... [8] A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. URL http://www.ics.uci.edu/~mlearn/{MLR}epository.html. |
| Dataset Splits | Yes | In our experiments, we split the training split Dtrain (of size N) of the four datasets considered into Dl (nl = 0.05N) and Dc (nc = 0.95N) and use their test split Dtest. |
| Hardware Specification | No | All our experiments were conducted on a machine with 16 CPUs. (This is not specific enough to determine the model or type of CPU) |
| Software Dependencies | No | The paper mentions using 'Adam [30]' as an optimizer, but does not specify version numbers for programming languages, libraries, or other software dependencies. |
| Experiment Setup | Yes | H.2.3 Detailed hyperparameters ... We use Adam [30] with a learning rate of 10-1 for 10 epochs. ... The clipping parameter τ is fixed to 1/K ... ES: The exponential smoothing parameter α is fixed to 1 − 1/K. |