Synbols: Probing Learning Algorithms with Synthetic Datasets

Authors: Alexandre Lacoste, Pau Rodríguez López, Frederic Branchaud-Charron, Parmida Atighehchian, Massimo Caccia, Issam Hadj Laradji, Alexandre Drouin, Matthew Craddock, Laurent Charlin, David Vázquez

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments probing the behavior of popular learning algorithms in various machinelearning settings including: the robustness of supervised learning and unsupervised representation-learning approaches w.r.t. changes in latent-data attributes ( 3.1 and 3.4) and to particular out-of-distribution patterns ( 3.2), the efficacy of different strategies and uncertainty calibration in active learning ( 3.3), and the effect of training losses for object counting ( 3.5).
Researcher Affiliation Collaboration 1Element AI {allac, pau.rodriguez, frederic.branchaud-charron, parmida, massimo.caccia, issam.laradji, adrouin, matt.craddock, dvazquez}@elementai.com 2Mila, Université de Montréal {massimo.p.caccia, lcharlin}@gmail.com
Pseudocode No The paper includes Python code snippets for defining dataset attributes, but no formally labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes We introduce Synbols2, an easy to use dataset generator with a rich composition of latent features for lower-resolution images. 2https://github.com/Element AI/synbols
Open Datasets Yes We introduce Synbols2, an easy to use dataset generator with a rich composition of latent features for lower-resolution images. 2https://github.com/Element AI/synbols
Dataset Splits Yes All results are obtained using a (train, valid, test) partition of size ratio (60%, 20%, 20%).
Hardware Specification Yes The total training time on datasets of size 100k is about 3 minutes for most models (including Res Net-12) on a Tesla V100 GPU.
Software Dependencies No The paper mentions 'Pycairo, a 2D vector graphics library' and 'Adam [22] is used to train all models,' but no specific version numbers for software dependencies are provided.
Experiment Setup Yes All results are obtained using a (train, valid, test) partition of size ratio (60%, 20%, 20%). Adam [22] is used to train all models, and the learning rate is selected using a validation set. Resnet12+ and WRN+ were trained with data augmentation consisting of random rotations, translation, shear, scaling, and color jitter.