reproducibilityindex.ai

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

Authors: Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our method significantly enhances the aesthetic behaviors of the vision models, under several metrics.
Researcher Affiliation	Collaboration	1Southeast University 2Tsinghua University 3Fudan University 4Microsoft
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	It is particularly costly to conduct a comprehensive code review. We plan to release the code in future. The training data won t be released due to privacy reasons.
Open Datasets	Yes	They are trained on very large image-text pair datasets, e.g. LAION [43] and Data Comp [8], rather than the traditional Image Net [6].
Dataset Splits	No	The paper describes the construction of a training dataset (Dpo) and introduces a test set (HPIR) but does not explicitly provide standard training, validation, and test splits for a single dataset.
Hardware Specification	Yes	The computational resources include 256 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions optimizers and model architectures but does not specify software dependencies (e.g., libraries or frameworks) with version numbers.
Experiment Setup	Yes	In the alignment fine-tuning loss, the Lpt component is configured identically to the pretraining phase described in Sec. 2.1, encompassing batch size, temperature, and data, with a weight of wpt = 1.0. For the remaining components, each batch comprises 128 queries. The overall learning rate is fixed to lr = 5 10 5. The partially ordered set Dpo, as discussed in Sec. 2.3, is derived using u = v = 5, and a stride of 10.