reproducibilityindex.ai

Anchor Data Augmentation

Authors: Nora Schneider, Shirin Goshtasbpour, Fernando Perez-Cruz

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 reports empirical evidence that our approach can improve predictions, especially in over-parameterized settings.
Researcher Affiliation	Academia	1Computer Science Department, ETH Zurich, Zurich, Switzerland 2Swiss Data Science Center, Zurich, Switzerland nschneide@student.ethz.ch shirin.goshtasbpour@inf.ethz.ch fernando.perezcruz@sdsc.ethz.ch
Pseudocode	Yes	3.3 Algorithm Algorithm 1 ADA: Minibatch generation 1: Input: L training data points (X, Y ); prior distribution for γ: p(γ) L q binary matrix A with a one per row indicating the clustering assignment for each sample. 2: Output: ( X, Y ) 3: Sample γ from p(γ) 4: Projection matrix: A A(AT A) AT 5: for i = 0 to row of X do 6: X(i) γ,A X(i)+(pγ 1)( A)(i)X 1+(pγ 1) Pj( A)(ij) , 7: Y(i) γ,A Y(i)+(pγ 1)( A)(i)Y 1+(pγ 1) Pj( A)(ij) 8: end for 9: return ( Xγ,A, Y γ,A)
Open Source Code	Yes	1Our Python implementation of ADA is available at: https://github.com/noraschneider/anchordataaugmentation/
Open Datasets	Yes	Data: We use the California housing dataset [19] and the Boston housing dataset [14]. Data: We use four of the ﬁve in-distribution datasets used in [49]. The validation and test data are expected to follow the same distribution as the training data. Airfoil Self-Noise (Airfoil) and NO2 [24] are both tabular datasets, whereas Exchange-Rate and Electricity [27] are time series datasets.
Dataset Splits	Yes	The validation set has 100,000 samples. We divide the datasets into train-, validation and test data randomly, as the authors of C-Mixup did.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or detailed computer specifications used for running its experiments.
Software Dependencies	No	The paper mentions 'Our Python implementation of ADA' but does not specify version numbers for Python or any specific software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn) required to replicate the experiment.
Experiment Setup	Yes	For the ADA and Local-Mixup experiments, we use hyperparameter tuning and grid search to ﬁnd the optimal training (batch size, learning rate, and number of epochs), and Local-Mixup parameters (distance threshold ) and ADA parameters (number of clusters, range of γ, and whether to use manifold augmentation). To be precise, we deﬁne 2 {2, 4, 6, 8, 10} and specify βi = 1 + 1 k/2 i (with i 2 {1, ..., k/2}) and γ 2 { 1 , 1 βk/2 1 , ..., 1 β1 , 1, β1, ..., βk/2 1, }. A is constructed using k-means clustering with q = 8.