Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time

Authors: Ferran Alet, Maria Bauza, Kenji Kawaguchi, Nurullah Giray Kuru, Tomás Lozano-Pérez, Leslie Kaelbling

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments, 5.1 Tailoring to impose symmetries and constraints at prediction time, Table 1: Test MSE loss for different methods; the second column shows the relative improvement over basic inductive supervised learning.
Researcher Affiliation Academia Ferran Alet, Maria Bauza, Kenji Kawaguchi, Nurullah Giray Kuru, Tomás Lozano-Pérez, Leslie Pack Kaelbling MIT {alet,bauza,kawaguch,ngkuru,tlp,lpk}@mit.edu
Pseudocode Yes Algorithm 1 MAMmo Th: Model-Agnostic Meta-Tailoring Subroutine Training(f, Lsup, λsup, Ltailor, λtailor, Dtrain,b), Algorithm 2 CNGRAD for meta-tailoring Subroutine Training(f, Lsup, λsup, Ltailor, λtailor, steps,Dtrain,b)
Open Source Code No The paper does not explicitly provide a link to its source code or state that it is publicly available.
Open Datasets Yes We provide experiments on the CIFAR-10 dataset [31] by building on Sim CLR [13]., We apply meta-tailoring to robustly classifying CIFAR-10 [31] and Image Net [15] images,
Dataset Splits No The paper mentions 'training data' and 'test samples' but does not provide specific percentages or counts for training, validation, and test splits, nor does it specify a cross-validation setup.
Hardware Specification No The paper mentions leveraging 'the MIT supercloud platform [42]' in the acknowledgements, but does not specify particular GPU models, CPU models, or detailed hardware configurations used for their experiments.
Software Dependencies No The paper mentions software frameworks like PyTorch [38], TensorFlow [1], and JAX [10], but it does not specify exact version numbers for these or other libraries required for replication.
Experiment Setup Yes The first-order gave slightly better results, possibly because it was trained with a higher tailor learning rate (10 3) with which the second-order version was unstable (we thus used 10 4)., We use ν = 0.1 for all experiments., Finally, we use σ = σ2 ν2 0.23, 0.49, 0.995 so that the points used in our tailoring loss come from N(x, σ2).