Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval
Authors: Maurits Bleeker, Andrew Yates, Maarten de Rijke
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that, unlike reconstructing the input caption in the input space, LTD reduces predictive feature suppression, measured by obtaining higher recall@k, r-precision, and n DCG scores than a contrastive ICR baseline. |
| Researcher Affiliation | Academia | Maurits Bleeker EMAIL University of Amsterdam Andrew Yates EMAIL University of Amsterdam Maarten de Rijke EMAIL University of Amsterdam |
| Pseudocode | No | The paper describes the Latent Target Decoding (LTD) method with equations and descriptive text, but it does not include a distinct block labeled "Pseudocode" or "Algorithm" with structured steps. |
| Open Source Code | Yes | To facilitate reproducibility and further research of our work, we include the code with our paper.2 1https://huggingface.co/sentence-transformers/all-mpnet-base-v2 2https://github.com/MauritsBleeker/reducing-predictive-feature-suppression/ |
| Open Datasets | Yes | For training and evaluating our ICR method, we use the two common ICR benchmark datasets: Flickr30k (F30k) (Young et al., 2014) and MS-COCO captions (COCO) (Lin et al., 2014). [...] We also use the crisscrossed captions (Cx C) dataset, which extends the COCO validation and test set with additional annotations of similar captions and images (Parekh et al., 2020)... |
| Dataset Splits | Yes | The F30k dataset contains 31,000 image-caption tuples. We use the train, validate and test split from (Karpathy & Fei-Fei, 2015), with 29,000 images for training, 1,000 for validation, and 1,000 for testing. COCO consists of 123,287 image-caption tuples. We use the train, validate and test split from (Karpathy & Fei-Fei, 2015); we do not use the 1k test setup. |
| Hardware Specification | No | The paper mentions that experiments are run "on a single GPU", but does not specify the exact model of the GPU (e.g., NVIDIA A100, RTX 2080 Ti), CPU, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions specific model implementations like "Hugging Face all-Mini LM-L6-v2 Sentence-BERT implementation" and "BERT (Devlin et al., 2018)", but does not provide specific programming language or machine learning framework versions (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiment. |
| Experiment Setup | Yes | Similar to (Chun et al., 2021), we use 30 warm-up and 30 fine-tune epochs, a batch size of 128, and a cosine annealing learning rate schedule with an initial learning rate of 2e-4. The Lagrange multiplier is initialized with a value of 1, bounded between 0 and 100, and is optimized by stochastic gradient ascent with a fixed learning rate of 5e-3 and a momentum (to prevent λ from fluctuating too much) and dampening value of α = 0.9. When we use Ldual, we set β to 1. For the Info NCE loss, we use a temperature value τ of 0.05. [...] For the reconstruction constraint bound η, we try for all experiments several values, η {0.05, 0.1, 0.15, 0.2, 0.25, 0.3}. When we apply ITD we use η {0.5, 1, 2, 3, 4, 5, 6}. |