reproducibilityindex.ai

Putting An End to End-to-End: Gradient-Isolated Learning of Representations

Authors: Sindy Löwe, Peter O'Connor, Bastiaan Veeling

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the applicability of the GIM approach to the visual and audio domain. As shown in Table 1, Greedy Info Max (GIM) outperforms its end-to-end trained CPC counterpart, despite its unsupervised features being optimized greedily without any backpropagation between modules.
Researcher Affiliation	Academia	AMLab University of Amsterdam
Pseudocode	No	The paper describes the approach using text and diagrams (Figure 1) but does not provide a formal pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at https://github.com/loewe X/Greedy_Info Max.
Open Datasets	Yes	We focus on the STL-10 dataset [Coates et al., 2011] which provides an additional unlabeled training dataset. We follow the setup of Oord et al. [2018] unless speciﬁed otherwise and use a 100-hour subset of the publicly available Libri Speech dataset [Panayotov et al., 2015].
Dataset Splits	Yes	The training curves of the two models as shown in Figure 3 provide some insight into this decreased performance. The learning curves of the ﬁrst module (Figure 3a) reﬂect that there is no difference in its training in the two models. Modules two and three (Figures 3b and 3c), however, reveal a crucial difference. The iteratively trained modules show a larger divergence between the training and validation loss, indicating stronger overﬁtting.
Hardware Specification	No	The paper mentions 'GPU memory consumption' and 'GPU memory' in Table 2, but does not specify any concrete GPU models, CPU types, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions general software like 'Kaldi toolkit' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For data augmentation, we take random 64 64 crops from the 96 96 images, ﬂip horizontally with probability 0.5 and convert to grayscale. We divide each image of 64 64 pixels into a total of 7 7 local patches, each of size 16 16 with 8 pixels overlap. The patches are encoded by a Res Net-50 v2 model [He et al., 2016] without batch normalization [Ioffe and Szegedy, 2015]. We split the model into three gradient-isolated modules that we train in sync and with a constant learning rate. Remaining implementation details are presented in Appendix A.1.