Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Clickbait Detection via Contrastive Variational Modelling of Text and Label

Authors: Xiaoyuan Yi, Jiarui Zhang, Wenhao Li, Xiting Wang, Xing Xie

IJCAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three clickbait detection datasets show our method s robustness to inadequate and biased labels, outperforming several recent strong baselines.
Researcher Affiliation Collaboration 1Microsoft Research Asia 2Tsinghua University
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets Yes We conduct experiments on three clickbait-related datasets. News Clickbait Detection (News). A public Kaggle competition dataset for news headline clickbait detection2. 2https://www.kaggle.com/c/clickbait-news-detection. Tweet Clickbait Detection (Tweet). A multi-modal dataset for the Tweet posts clickbait detection competition3. 3https://webis.de/events/clickbait-challenge. News Headline Incongruence Detection (NELA). An automatically constructed dataset for detecting incongruity between a given news headline and body text [Yoon et al., 2019].
Dataset Splits Yes Dataset Training validation Testing News 17,538 (23%) 1,500 (33%) 3,063 (33%) Tweet 17,588 (22%) 2,000 (25%) 17,554 (21%) NELA 50,000 (51%) 6,690 (51%) 6,745 (51%)
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using pre-trained models like Uni LM and BERT, but it does not specify software dependencies with version numbers (e.g., specific versions of deep learning frameworks or libraries).
Experiment Setup Yes The label embedding size, latent variable size, number of latent samples K, batch size and learning rate are 64, 256, 16, 24 and 2e-4, respectively. We use cyclic annealing [Fu et al., 2019] to alleviate the KL annealing problem in VAE training.