reproducibilityindex.ai

Towards Test-Time Refusals via Concept Negation

Authors: Peiran Dong, Song Guo, Junxiao Wang, Bingjie WANG, Jiewei Zhang, Ziming Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation on multiple benchmarks shows that PROTORE outperforms state-of-the-art methods under various settings, in terms of the effectiveness of purification and the fidelity of generative images. ... Through comprehensive evaluations on multiple benchmarks, we demonstrate that PROTORE surpasses existing methods in terms of purification effectiveness and the fidelity of generated images across various settings. ... In this section, we empirically evaluate the effectiveness of our proposed PROTORE.
Researcher Affiliation	Academia	1Hong Kong Polytechnic University 2Hong Kong University of Science and Technology 3King Abdullah University of Science and Technology & SDAIA-KAUST AI {peiran.dong,bingjie.wang,jiewei.zhang,ziming.liu}@connect.polyu.hk songguo@cse.ust.hk junxiao.wang@kaust.edu.sa
Pseudocode	No	Our proposed algorithm is formally presented in the Appendix, which consists primarily of two steps. While the paper mentions an algorithm in the appendix, the provided text does not contain the appendix to verify a pseudocode block.
Open Source Code	No	The paper does not contain an explicit statement about releasing code or a link to a code repository for the described methodology.
Open Datasets	Yes	Image Net subset. We first investigate the performance of single-concept refusal through numerical results. Specifically, we choose one class from Image Net as the negation target. ... Following the same setting in ESD [22], we select the Imagenette subset that consists of ten readily recognizable classes. ... Inappropriate Image Prompts (I2P) benchmark dataset [24] contains 4703 toxic prompts assigned to at least one of the following categories: hate, harassment, violence, self-harm, sexual, shocking, illegal activity. ... To this end, we follow prior work [24, 22] on generative text-to-image models and evaluate the COCO FID-30k scores of SD and the three additional methods, as presented in Table 3.
Dataset Splits	No	The paper mentions using training, validation, and test splits implicitly through terms like "Imagenet classifier" and "COCO 30k dataset" which usually have standard splits. However, it does not provide explicit details about percentages, counts, or specific splitting methodology for its experiments within the main text.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper refers to using pre-trained models like CLIP and Resnet-50 Imagenet classifier, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	The refinement step size σ is set to 1.0 in our experiments unless specified otherwise. ... We employed inference guidance of 7.5 in our experiments.