Transformation-Invariant Learning and Theoretical Guarantees for OOD Generalization

Authors: Omar Montasser, Han Shao, Emmanuel Abbe

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results for a basic experiment on learning Boolean functions on the hypercube {−1}d.
Researcher Affiliation Collaboration Omar Montasser Yale University omar.montasser@yale.edu Han Shao Harvard University han@ttic.edu Emmanuel Abbe EPFL and Apple emmanuel.abbe@epfl.ch
Pseudocode Yes Algorithm 1: Reduction to Minimize Worst-Case Risk
Open Source Code No The paper states 'used Python and Py Torch to implement code' but does not provide a link or an explicit statement about the availability of the code.
Open Datasets No We consider a uniform distribution D over {−1}d and two target functions: (1) f1(x) = Πdi=1xi, the parity function, and (2) f2(x) = sign(P2j=0(Πd/3i=1xj(d/3)+i)), a majority-of-subparities function. We consider transformations T1, T2 under which f1, f2 are invariant, respectively (see Section 2). Since D is uniform, note that for any ˆh: supT∈T err(ˆh, T(Df )) = err(ˆh, Df ).
Dataset Splits No The paper mentions 'train set size' and 'test set size' but does not specify a separate validation set or describe a specific data splitting methodology for reproduction.
Hardware Specification Yes We ran experiments on freely available Google Co Lab T4 GPUs, and used Python and Py Torch to implement code.
Software Dependencies No The paper states 'used Python and Py Torch to implement code' but does not specify version numbers for either Python or PyTorch.
Experiment Setup Yes We use a two-layer feed-forward neural network architecture with 512 hidden units as our hypothesis class H. We use the squared loss and consider two training algorithms. First, the baseline is running standard mini-batch SGD on training examples. Second, as a heuristic to implement Equation (2), we run mini-batch SGD on training examples and permutations of them. Specifically, in each step we replace correctly classified training examples in a mini-batch with random permutations of them (drawn from T ), and then perform an SGD update on this modified mini-batch. We set the mini-batch size to 1 and the learning rate to 0.01.