Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TSVC: Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval

Authors: Shuai Lyu, Zijing Tian, Zhonghong Ou, Yifan Zhu, Xiao Zhang, Qiankun Ha, Haoran Luo, Meina Song

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three widely used datasets demonstrate that, even at increasing noise ratios, TSVC exhibits significant advantages in retrieval accuracy and maintains stable training performance. The paper includes sections like "Experiments", "Datasets", "Evaluation Metrics", "Main Results", "Ablation Study", and numerous tables and figures presenting empirical performance metrics.
Researcher Affiliation Academia 1School of Computer Science, Beijing University of Posts and Telecommunications, China 2School of Science, Beijing University of Posts and Telecommunications, China 3State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, China EMAIL
Pseudocode No The paper describes the 'Tripartite Cooperative Learning Mechanism' in a step-by-step manner within the text, but it does not present it as a formal pseudocode block or algorithm section with a clear 'Algorithm' label.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes We utilize the following three widely used multi-modal datasets to evaluate our method: Flickr30K This dataset contains 31,000 images collected from Flickr... MSCOCO This dataset consists of 123,287 images... Conceptual Captions. A large scale dataset mainly collected from the Internet... Similar to previous studies (Huang et al. 2021), we use the CC152K subset from Conceptual Captions in our experiments...
Dataset Splits Yes Flickr30K...we use 29,000 images for training, 1,000 for validation, and 1,000 for testing to evaluate the model s performance. MSCOCO...113,287 images are used for training, 5,000 for validation, and 5,000 for testing... Conceptual Captions...contains 150k images for training, and 1000 images each for validation and testing.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes The hyperparameters δ and m represent the threshold used for dividing noisy samples and the parameter controlling the soft margin in DASM, respectively... the value of Rsum increases continuously with an increase in δ, reaching its peak at δ = 0.5 (between 0.5 to 0.6 for MSCOCO)... The optimal value for m is 10, which significantly influences the performance of the model by affecting the size of the soft-margin.