Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization

Authors: Hedda Cohen Indelman, Tamir Hazan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically the effectiveness of learning this balance in structured discrete spaces.
Researcher Affiliation Academia 1Technion. Correspondence to: Hedda Cohen Indelman <cohen.hedda@campus.technion.ac.il>.
Pseudocode No The paper describes its methods using text and mathematical equations, and includes architectural diagrams, but it does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code may be found in https: //github.com/Hedda Cohen Indelman/ Perturbed Structured Predictors Direct.
Open Datasets Yes We report the classification accuracies on the standard test sets in Table 3. For MNIST and Fashion-MNIST, our method matched or outperformed Neural Sort (Grover et al., 2019) and Relax Sub Sample (Xie and Ermon, 2019)... For CIFAR-10, our method outperformed Neural Sort and Relax Sub Sample...
Dataset Splits No The paper mentions a 'training set' and 'test set' for the bipartite matching experiment ('the training set consists of 10 random sequences of length d and a test set that consists of a single sequence of the same length d'). For the k-NN experiments on MNIST, Fashion-MNIST, and CIFAR-10, it refers to 'standard test sets', but does not explicitly provide specific train/validation/test splits or percentages.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific solver versions) used in the experiments.
Experiment Setup Yes In all direct loss based experiments we set a negative ϵ. The network µ has a first fully connected layer that links the sets of samples to an intermediate representation (with 32 neurons), and a second (fully connected) layer that turns those representations into batches of latent permutation matrices of dimension d by d each. ... The network σ has a single layer connecting input sample sequences to a single output which is then activated by a softplus activation. to perform 20 Sinkhorn iterations and 10 different reconstruction for each batch sample.