Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Synbols: Probing Learning Algorithms with Synthetic Datasets
Authors: Alexandre Lacoste, Pau Rodríguez López, Frederic Branchaud-Charron, Parmida Atighehchian, Massimo Caccia, Issam Hadj Laradji, Alexandre Drouin, Matthew Craddock, Laurent Charlin, David Vázquez
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments probing the behavior of popular learning algorithms in various machinelearning settings including: the robustness of supervised learning and unsupervised representation-learning approaches w.r.t. changes in latent-data attributes ( 3.1 and 3.4) and to particular out-of-distribution patterns ( 3.2), the efficacy of different strategies and uncertainty calibration in active learning ( 3.3), and the effect of training losses for object counting ( 3.5). |
| Researcher Affiliation | Collaboration | 1Element AI EMAIL 2Mila, Université de Montréal EMAIL |
| Pseudocode | No | The paper includes Python code snippets for defining dataset attributes, but no formally labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We introduce Synbols2, an easy to use dataset generator with a rich composition of latent features for lower-resolution images. 2https://github.com/Element AI/synbols |
| Open Datasets | Yes | We introduce Synbols2, an easy to use dataset generator with a rich composition of latent features for lower-resolution images. 2https://github.com/Element AI/synbols |
| Dataset Splits | Yes | All results are obtained using a (train, valid, test) partition of size ratio (60%, 20%, 20%). |
| Hardware Specification | Yes | The total training time on datasets of size 100k is about 3 minutes for most models (including Res Net-12) on a Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions 'Pycairo, a 2D vector graphics library' and 'Adam [22] is used to train all models,' but no specific version numbers for software dependencies are provided. |
| Experiment Setup | Yes | All results are obtained using a (train, valid, test) partition of size ratio (60%, 20%, 20%). Adam [22] is used to train all models, and the learning rate is selected using a validation set. Resnet12+ and WRN+ were trained with data augmentation consisting of random rotations, translation, shear, scaling, and color jitter. |