Fidelity-Weighted Learning
Authors: Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations. We introduce the proposed FWL approach in more detail in Section 2. We then present our experimental setup in Section 3 where we evaluate FWL on a toy task and two real-world tasks, namely document ranking and sentence sentiment classification. In all cases, FWL outperforms competitive baselines and yields state-of-the-art results, indicating that FWL makes better use of the limited true labeled data and is thereby able to learn a better and more meaningful task-specific representation of the data. In this section, we apply FWL first to a toy problem and then to two different real tasks: document ranking and sentiment classification. |
| Researcher Affiliation | Collaboration | Mostafa Dehghani Arash Mehrjou Stephan Gouws University of Amsterdam MPI for Intelligent Systems Google Brain dehghani@uva.nl amehrjou@tuebingen.mpg.de sgouws@google.com Jaap Kamps Bernhard Sch olkopf University of Amsterdam MPI for Intelligent Systems kamps@uva.nl bs@tuebingen.mpg.de |
| Pseudocode | Yes | Algorithm 1 Fidelity-Weighted Learning. Algorithm 2 Clustered Gaussian processes. |
| Open Source Code | No | Not found. The paper mentions using TensorFlow and GPflow but does not provide a link or explicit statement about the availability of their own source code for the FWL methodology. |
| Open Datasets | Yes | We use two standard TREC collections for the task of ad-hoc retrieval: The first collection (Robust04) consists of 500k news articles from different news agencies as a homogeneous collection. The second collection (Clue Web) is Clue Web09 Category B, a large-scale web collection with over 50 million English documents, which is considered as a heterogeneous collection. We test our model on the twitter message-level sentiment classification of Sem Eval-15 Task 10B (Rosenthal et al., 2015). Datasets of Sem Eval-15 subsume the test sets from previous editions of Sem Eval, i.e. Sem Eval-13 and Sem Eval-14. |
| Dataset Splits | Yes | We conducted k-fold cross validation on Ds (the strong data). We use train (9,728 tweets) and development (1,654 tweets) data from Sem Eval-13 for training and Sem Eval-13-test (3,813 tweets) for validation. |
| Hardware Specification | No | Not found. The paper states that "The neural networks are implemented in Tensor Flow", but no specific details about the hardware (e.g., GPU model, CPU type, memory) used for experiments are provided. |
| Software Dependencies | No | Not found. The paper mentions software tools like "Tensor Flow (Abadi et al., 2015; Tang, 2016)", "GPflow (Matthews et al., 2017)", and "Adam (Kingma & Ba, 2015)", but it does not specify exact version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The initial learning rate and the dropout parameter were selected from {10 3,10 5} and {0.0,0.2,0.5}, respectively. We considered embedding sizes of {300,500}. The batch size in our experiments was set to 128. We use Re LU (Nair & Hinton, 2010) as a non-linear activation function α in student. We use the Adam optimizer (Kingma & Ba, 2015) for training, and dropout (Srivastava et al., 2014) as a regularization technique. The initial learning rate and the dropout parameter were selected from {1E 3,1E 5} and {0.0,0.2,0.5}, respectively. We considered embedding sizes of {100,200} and the batch size in these experiments was set to 64. |