Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Understanding Deep Architecture with Reasoning Layer
Authors: Xinshi Chen, Yufei Zhang, Christoph Reisinger, Le Song
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments aim to validate our theoretical prediction with computational simulations, rather than obtaining state-of-the-art results. We conduct two sets of experiments, where the ο¬rst set of experiments strictly follows the problem setting described in Sec 2 and the second is conducted on BSD500 dataset [56] to demonstrate the possibility of generalizing the theorem to more realistic applications. Implementations in Python are released. |
| Researcher Affiliation | Academia | Xinshi Chen Georgia Institute of Technology EMAIL Yufei Zhang University of Oxford EMAIL Christoph Reisinger University of Oxford EMAIL Le Song Georgia Institute of Technology EMAIL |
| Pseudocode | No | The paper presents algorithm update steps (e.g., for GD and NAG) but does not provide a formally structured pseudocode block or algorithm box. |
| Open Source Code | Yes | Implementations in Python are released1. 1https://github.com/xinshi-chen/Deep-Architecture-With-Reasoning-Layer |
| Open Datasets | Yes | We split BSD500 (400 images) into a training set (100 images) and a test set (300 images). |
| Dataset Splits | No | The paper states the training and test set sizes for the BSD500 dataset, but it does not explicitly mention a separate validation set or its split. For synthetic experiments, it states "During training, n samples are randomly drawn from these 10000 data points as the training set" without specifying a validation split. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states "Implementations in Python are released" but does not specify the version of Python or any other software libraries used, nor their versions. |
| Experiment Setup | Yes | Each model is trained by ADAM and SGD with learning rate grid-searched from [1e-2,5e-3,1e-3,5e-4,1e-4], and only the best result is reported. |