Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

Authors: Otmane Sakhi, Imad Aouali, Pierre Alquier, Nicolas Chopin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive policy evaluation, selection, and learning experiments highlight the versatility and favorable performance of LS. ... Extensive experiments in Section 5 highlight the favorable performance of LS
Researcher Affiliation Collaboration Otmane Sakhi Criteo AI Lab, Paris, France o.sakhi@criteo.com Imad Aouali CREST, ENSAE Criteo AI Lab, Paris, France i.aouali@criteo.com Pierre Alquier ESSEC Business School, Singapore alquier@essec.edu Nicolas Chopin CREST, ENSAE nicolas.chopin@ensae.fr
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code can be found at https://github.com/otmhi/offpolicy_ls.
Open Datasets Yes 11 real multiclass classification datasets are chosen from the UCI ML Repository [8] ... [8] A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. URL http://www.ics.uci.edu/~mlearn/{MLR}epository.html.
Dataset Splits Yes In our experiments, we split the training split Dtrain (of size N) of the four datasets considered into Dl (nl = 0.05N) and Dc (nc = 0.95N) and use their test split Dtest.
Hardware Specification No All our experiments were conducted on a machine with 16 CPUs. (This is not specific enough to determine the model or type of CPU)
Software Dependencies No The paper mentions using 'Adam [30]' as an optimizer, but does not specify version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup Yes H.2.3 Detailed hyperparameters ... We use Adam [30] with a learning rate of 10-1 for 10 epochs. ... The clipping parameter τ is fixed to 1/K ... ES: The exponential smoothing parameter α is fixed to 1 − 1/K.