Transformation-Invariant Learning and Theoretical Guarantees for OOD Generalization
Authors: Omar Montasser, Han Shao, Emmanuel Abbe
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present results for a basic experiment on learning Boolean functions on the hypercube {−1}d. |
| Researcher Affiliation | Collaboration | Omar Montasser Yale University omar.montasser@yale.edu Han Shao Harvard University han@ttic.edu Emmanuel Abbe EPFL and Apple emmanuel.abbe@epfl.ch |
| Pseudocode | Yes | Algorithm 1: Reduction to Minimize Worst-Case Risk |
| Open Source Code | No | The paper states 'used Python and Py Torch to implement code' but does not provide a link or an explicit statement about the availability of the code. |
| Open Datasets | No | We consider a uniform distribution D over {−1}d and two target functions: (1) f1(x) = Πdi=1xi, the parity function, and (2) f2(x) = sign(P2j=0(Πd/3i=1xj(d/3)+i)), a majority-of-subparities function. We consider transformations T1, T2 under which f1, f2 are invariant, respectively (see Section 2). Since D is uniform, note that for any ˆh: supT∈T err(ˆh, T(Df )) = err(ˆh, Df ). |
| Dataset Splits | No | The paper mentions 'train set size' and 'test set size' but does not specify a separate validation set or describe a specific data splitting methodology for reproduction. |
| Hardware Specification | Yes | We ran experiments on freely available Google Co Lab T4 GPUs, and used Python and Py Torch to implement code. |
| Software Dependencies | No | The paper states 'used Python and Py Torch to implement code' but does not specify version numbers for either Python or PyTorch. |
| Experiment Setup | Yes | We use a two-layer feed-forward neural network architecture with 512 hidden units as our hypothesis class H. We use the squared loss and consider two training algorithms. First, the baseline is running standard mini-batch SGD on training examples. Second, as a heuristic to implement Equation (2), we run mini-batch SGD on training examples and permutations of them. Specifically, in each step we replace correctly classified training examples in a mini-batch with random permutations of them (drawn from T ), and then perform an SGD update on this modified mini-batch. We set the mini-batch size to 1 and the learning rate to 0.01. |