reproducibilityindex.ai

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

Authors: Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, QIANRU SUN

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness of our method, which ensures that the post-edit MLLM simultaneously maintains excellent reliability, generality, and locality.
Researcher Affiliation	Collaboration	Zhejiang University1, National University of Singapore2 Hefei University of Technology3, Nanyang Technological University4 Singapore Management University5
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	Yes	The code for Uni KE is available at https://github.com/beepkh/Uni KE.
Open Datasets	Yes	Our experiments are conducted on the MMEdit benchmark [4], which contains two subtasks: Editing VQA (E-VQA) and Editing Image Caption (E-IC). ... And the MMEdit benchmark is under MIT license.
Dataset Splits	No	The paper states it adheres to the testing settings of the MMEdit dataset and leverages a consistent random seed defined in MMEdit during test, but does not explicitly provide the specific training/validation/test dataset splits (percentages or sample counts) within its text.
Hardware Specification	Yes	completing a single one-step edit takes only a matter of seconds and we run all experiments with 6 NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions using BLIP2-OPT and Mini GPT-4 as backbone models but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or CUDA versions).
Experiment Setup	Yes	In intrinsic knowledge editing, we add extra 10 key-value pairs in the FFN of the last four transformer layers; for external knowledge resorting, we retrieve top-40 hidden states of in-context knowledge with the highest similarity for each case and conduct feature shifting for in-context editing in the last four transformer layers. ... During contrastive learning, both encoders are optimized using the Adam optimizer with a learning rate of 1e-4.