Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
VALUED - Vision and Logical Understanding Evaluation Dataset
Authors: Soumadeep Saha, Saptarshi Saha, Utpal Garain
DMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze several popular and state of the art vision models on this task, and show that, although their performance on standard metrics are laudable, they produce a plethora of incoherent results, indicating that this dataset presents a significant challenge for future works. |
| Researcher Affiliation | Academia | Soumadeep Saha soumadeep.saha EMAIL Indian Statistical Institute Kolkata, India; Saptarshi Saha saptarshi.saha EMAIL Indian Statistical Institute Kolkata, India; Utpal Garain EMAIL Indian Statistical Institute Kolkata, India |
| Pseudocode | No | The paper defines a classifier function Fθ and lists a rule set (Equation 2) but does not present these or any other procedures in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | All associated code (database creation, rule checking, etc.), materials (dataset, 3D models, textures, images, etc.), rendering details (camera sensor, rendering settings, etc.) and relevant information has been made available through the github repository.4 (Code repository https://github.com/espressoVi/VALUE-Dataset) |
| Open Datasets | Yes | We present the VALUE Dataset a collection of 200,000+ annotated images of chess games in progress... (Dataset https://doi.org/10.5281/zenodo.10607059) |
| Dataset Splits | Yes | 200,000 such images were rendered at (512 512 3) to form the training/validation set and an additional 19,967 images form the test set. |
| Hardware Specification | Yes | We trained the models on a single NVIDIA A6000 48GB GPU. |
| Software Dependencies | No | The paper mentions 'pytorch' as the implementation framework but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | These pre-trained models were fine-tuned for 2 epochs after replacing the final fully connected layer with one of size (in features 8 8 class number) and adding dropouts(10%) in between. The models were implemented in pytorch, and trained with the Adam W optimizer (learning rate of 10 4) and Cross Entropy loss. |