reproducibilityindex.ai

Fidelity-Weighted Learning

Authors: Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations. We introduce the proposed FWL approach in more detail in Section 2. We then present our experimental setup in Section 3 where we evaluate FWL on a toy task and two real-world tasks, namely document ranking and sentence sentiment classiﬁcation. In all cases, FWL outperforms competitive baselines and yields state-of-the-art results, indicating that FWL makes better use of the limited true labeled data and is thereby able to learn a better and more meaningful task-speciﬁc representation of the data. In this section, we apply FWL ﬁrst to a toy problem and then to two different real tasks: document ranking and sentiment classiﬁcation.
Researcher Affiliation	Collaboration	Mostafa Dehghani Arash Mehrjou Stephan Gouws University of Amsterdam MPI for Intelligent Systems Google Brain dehghani@uva.nl amehrjou@tuebingen.mpg.de sgouws@google.com Jaap Kamps Bernhard Sch olkopf University of Amsterdam MPI for Intelligent Systems kamps@uva.nl bs@tuebingen.mpg.de
Pseudocode	Yes	Algorithm 1 Fidelity-Weighted Learning. Algorithm 2 Clustered Gaussian processes.
Open Source Code	No	Not found. The paper mentions using TensorFlow and GPflow but does not provide a link or explicit statement about the availability of their own source code for the FWL methodology.
Open Datasets	Yes	We use two standard TREC collections for the task of ad-hoc retrieval: The first collection (Robust04) consists of 500k news articles from different news agencies as a homogeneous collection. The second collection (Clue Web) is Clue Web09 Category B, a large-scale web collection with over 50 million English documents, which is considered as a heterogeneous collection. We test our model on the twitter message-level sentiment classiﬁcation of Sem Eval-15 Task 10B (Rosenthal et al., 2015). Datasets of Sem Eval-15 subsume the test sets from previous editions of Sem Eval, i.e. Sem Eval-13 and Sem Eval-14.
Dataset Splits	Yes	We conducted k-fold cross validation on Ds (the strong data). We use train (9,728 tweets) and development (1,654 tweets) data from Sem Eval-13 for training and Sem Eval-13-test (3,813 tweets) for validation.
Hardware Specification	No	Not found. The paper states that "The neural networks are implemented in Tensor Flow", but no specific details about the hardware (e.g., GPU model, CPU type, memory) used for experiments are provided.
Software Dependencies	No	Not found. The paper mentions software tools like "Tensor Flow (Abadi et al., 2015; Tang, 2016)", "GPﬂow (Matthews et al., 2017)", and "Adam (Kingma & Ba, 2015)", but it does not specify exact version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	The initial learning rate and the dropout parameter were selected from {10 3,10 5} and {0.0,0.2,0.5}, respectively. We considered embedding sizes of {300,500}. The batch size in our experiments was set to 128. We use Re LU (Nair & Hinton, 2010) as a non-linear activation function α in student. We use the Adam optimizer (Kingma & Ba, 2015) for training, and dropout (Srivastava et al., 2014) as a regularization technique. The initial learning rate and the dropout parameter were selected from {1E 3,1E 5} and {0.0,0.2,0.5}, respectively. We considered embedding sizes of {100,200} and the batch size in these experiments was set to 64.