Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Authors: Mathias Niepert, Pasquale Minervini, Luca Franceschi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The set of experiments can be divided into three parts. First, we analyze and compare the behavior of I-MLE with (i) the score function and (ii) the straight-through estimator using a toy problem. Second, we explore the latent variable setting where both hv and fu in Eq. (1) are neural networks and the optimal structure is not available during training. Finally, we address the problem of differentiating through black-box combinatorial optimization problems, where we use the target distribution derived in Section 4.
Researcher Affiliation Collaboration Mathias Niepert NEC Laboratories Europe mathias.niepert@neclab.eu Pasquale Minervini University College London p.minervini@ucl.ac.uk Luca Franceschi Istituto Italiano di Tecnologia University College London ucablfr@ucl.ac.uk
Pseudocode Yes Algorithm 1 Instance of I-MLE with perturbation-based implicit differentiation.
Open Source Code Yes We provide implementations and Python notebooks at https://github.com/nec-research/tf-imle
Open Datasets Yes The BEERADVOCATE dataset [Mc Auley et al., 2012] consists of free-text reviews and ratings for 4 different aspects of beer: appearance, aroma, palate, and taste.
Dataset Splits Yes Since the original dataset [Mc Auley et al., 2012] did not provide separate validation and test sets, we compute 10 different evenly sized validation/test splits of the 10k held out set and compute mean and standard deviation over 10 models, each trained on one split.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types, memory amounts) for running its experiments.
Software Dependencies No The paper mentions 'modern deep learning pipelines' and 'Adam settings' but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or other libraries.
Experiment Setup Yes Hyperparameters are optimized against L for all methods independently. Statistics are over 100 runs. We used the standard hyperparameter settings of Chen et al. [2018] and choose the temperature parameter t {0.1, 0.5, 1.0, 2.0}. For I-MLE we choose λ {101, 102, 103}, while for both I-MLE and STE we choose τ {k, 2k, 3k} based on the validation MSE. We used the standard Adam settings.