reproducibilityindex.ai

Transformation-Invariant Learning and Theoretical Guarantees for OOD Generalization

Authors: Omar Montasser, Han Shao, Emmanuel Abbe

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present results for a basic experiment on learning Boolean functions on the hypercube {−1}d.
Researcher Affiliation	Collaboration	Omar Montasser Yale University omar.montasser@yale.edu Han Shao Harvard University han@ttic.edu Emmanuel Abbe EPFL and Apple emmanuel.abbe@epfl.ch
Pseudocode	Yes	Algorithm 1: Reduction to Minimize Worst-Case Risk
Open Source Code	No	The paper states 'used Python and Py Torch to implement code' but does not provide a link or an explicit statement about the availability of the code.
Open Datasets	No	We consider a uniform distribution D over {−1}d and two target functions: (1) f1(x) = Πdi=1xi, the parity function, and (2) f2(x) = sign(P2j=0(Πd/3i=1xj(d/3)+i)), a majority-of-subparities function. We consider transformations T1, T2 under which f1, f2 are invariant, respectively (see Section 2). Since D is uniform, note that for any ˆh: supT∈T err(ˆh, T(Df )) = err(ˆh, Df ).
Dataset Splits	No	The paper mentions 'train set size' and 'test set size' but does not specify a separate validation set or describe a specific data splitting methodology for reproduction.
Hardware Specification	Yes	We ran experiments on freely available Google Co Lab T4 GPUs, and used Python and Py Torch to implement code.
Software Dependencies	No	The paper states 'used Python and Py Torch to implement code' but does not specify version numbers for either Python or PyTorch.
Experiment Setup	Yes	We use a two-layer feed-forward neural network architecture with 512 hidden units as our hypothesis class H. We use the squared loss and consider two training algorithms. First, the baseline is running standard mini-batch SGD on training examples. Second, as a heuristic to implement Equation (2), we run mini-batch SGD on training examples and permutations of them. Specifically, in each step we replace correctly classified training examples in a mini-batch with random permutations of them (drawn from T ), and then perform an SGD update on this modified mini-batch. We set the mini-batch size to 1 and the learning rate to 0.01.