On the hardness of learning under symmetries

Authors: Bobak Kiani, Thien Le, Hannah Lawrence, Stefanie Jegelka, Melanie Weber

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, Sec. 7 provides a few experiments verifying that the hard classes of invariant functions we propose are indeed difficult to learn. ... We train overparameterized GNNs and CNNs on the hard functions from Sec. 4 and Sec. 5, respectively. ... Fig. 1a and 1b plot the performance of the GNN and CNN respectively.
Researcher Affiliation Academia Bobak T. Kiani 12, Thien Le 1, Hannah Lawrence 1, Stefanie Jegelka13, Melanie Weber2 1 MIT EECS, 2 Harvard SEAS, 3 TU Munich
Pseudocode No The paper contains mathematical formulations and proofs, but no explicitly labeled 'Algorithm' or 'Pseudocode' blocks.
Open Source Code No The paper does not include an unambiguous statement or link indicating that the authors' implementation code for the described methodology is publicly available.
Open Datasets Yes The GNN is unable to fit even the training data consisting of 225 n = 15 node graphs drawn uniformly from the Erd os R enyi model (i.e., p = 0.5).
Dataset Splits No The paper mentions 'training data' and a 'test set' in its experimental details, but does not explicitly provide information about a separate validation split or how it was used.
Hardware Specification No The paper states, 'All experiments were run using Pytorch on a single GPU (Paszke et al., 2019),' which lacks specific hardware details such as the GPU model, CPU, or memory.
Software Dependencies No The paper mentions 'Pytorch Geometric' and 'Pytorch' but does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup Yes The overparameterized GNN used during training consisted of 3 layers of graph convolution followed by a node aggregation average pooling layer and a two layer Re LU MLP with width 64. The graph convolution layers used 32 channels. ... The network was given 10n = 500 training samples and was trained with the Adam optimizer with batch size 32. ... In our experiments, we used the Adam optimizer and tuned the learning rate in the range [0.0001, 0.003]. For CNN experiments, to increase stability of training in later stages, we added a scheduler that divided the learning rate by two every 200 epochs.