Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VALUED - Vision and Logical Understanding Evaluation Dataset

Authors: Soumadeep Saha, Saptarshi Saha, Utpal Garain

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze several popular and state of the art vision models on this task, and show that, although their performance on standard metrics are laudable, they produce a plethora of incoherent results, indicating that this dataset presents a significant challenge for future works.
Researcher Affiliation	Academia	Soumadeep Saha soumadeep.saha EMAIL Indian Statistical Institute Kolkata, India; Saptarshi Saha saptarshi.saha EMAIL Indian Statistical Institute Kolkata, India; Utpal Garain EMAIL Indian Statistical Institute Kolkata, India
Pseudocode	No	The paper defines a classifier function Fθ and lists a rule set (Equation 2) but does not present these or any other procedures in a structured pseudocode or algorithm block.
Open Source Code	Yes	All associated code (database creation, rule checking, etc.), materials (dataset, 3D models, textures, images, etc.), rendering details (camera sensor, rendering settings, etc.) and relevant information has been made available through the github repository.4 (Code repository https://github.com/espressoVi/VALUE-Dataset)
Open Datasets	Yes	We present the VALUE Dataset a collection of 200,000+ annotated images of chess games in progress... (Dataset https://doi.org/10.5281/zenodo.10607059)
Dataset Splits	Yes	200,000 such images were rendered at (512 512 3) to form the training/validation set and an additional 19,967 images form the test set.
Hardware Specification	Yes	We trained the models on a single NVIDIA A6000 48GB GPU.
Software Dependencies	No	The paper mentions 'pytorch' as the implementation framework but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	These pre-trained models were fine-tuned for 2 epochs after replacing the final fully connected layer with one of size (in features 8 8 class number) and adding dropouts(10%) in between. The models were implemented in pytorch, and trained with the Adam W optimizer (learning rate of 10 4) and Cross Entropy loss.