Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Authors: Mathias Niepert, Pasquale Minervini, Luca Franceschi
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The set of experiments can be divided into three parts. First, we analyze and compare the behavior of I-MLE with (i) the score function and (ii) the straight-through estimator using a toy problem. Second, we explore the latent variable setting where both hv and fu in Eq. (1) are neural networks and the optimal structure is not available during training. Finally, we address the problem of differentiating through black-box combinatorial optimization problems, where we use the target distribution derived in Section 4. |
| Researcher Affiliation | Collaboration | Mathias Niepert NEC Laboratories Europe mathias.niepert@neclab.eu Pasquale Minervini University College London p.minervini@ucl.ac.uk Luca Franceschi Istituto Italiano di Tecnologia University College London ucablfr@ucl.ac.uk |
| Pseudocode | Yes | Algorithm 1 Instance of I-MLE with perturbation-based implicit differentiation. |
| Open Source Code | Yes | We provide implementations and Python notebooks at https://github.com/nec-research/tf-imle |
| Open Datasets | Yes | The BEERADVOCATE dataset [Mc Auley et al., 2012] consists of free-text reviews and ratings for 4 different aspects of beer: appearance, aroma, palate, and taste. |
| Dataset Splits | Yes | Since the original dataset [Mc Auley et al., 2012] did not provide separate validation and test sets, we compute 10 different evenly sized validation/test splits of the 10k held out set and compute mean and standard deviation over 10 models, each trained on one split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types, memory amounts) for running its experiments. |
| Software Dependencies | No | The paper mentions 'modern deep learning pipelines' and 'Adam settings' but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | Hyperparameters are optimized against L for all methods independently. Statistics are over 100 runs. We used the standard hyperparameter settings of Chen et al. [2018] and choose the temperature parameter t {0.1, 0.5, 1.0, 2.0}. For I-MLE we choose λ {101, 102, 103}, while for both I-MLE and STE we choose τ {k, 2k, 3k} based on the validation MSE. We used the standard Adam settings. |