Mixture of Expert/Imitator Networks: Scalable Semi-Supervised Learning Framework
Authors: Shun Kiyono, Jun Suzuki, Kentaro Inui4073-4081
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the proposed method consistently improves the performance of several types of baseline DNNs. |
| Researcher Affiliation | Academia | Shun Kiyono,1 Jun Suzuki,1,2 Kentaro Inui1,2 1Tohoku University, 2RIKEN Center for Advanced Intelligence Project |
| Pseudocode | Yes | Algorithm 1: Training framework of MEIN Data: Labeled data Ds and unlabeled data Du Result: Trained set of parameters bΘ, bΦ, bΛ 1 Θ arg min Θ {Ls(Θ|Ds)} Train EXN (Equation 3) 2 bΦ arg min Φ {Lu(Φ|Θ , Du)} Train IMN(s) (Equation 11) 3 bΘ, bΛ arg min Θ,Λ {L s(Θ, Λ|bΦ,Ds)} Train EXN (Equation 13) |
| Open Source Code | No | The paper mentions using a third-party tool, SentencePiece, and provides its GitHub link, but it does not state that the authors' own source code for the proposed method is available. |
| Open Datasets | Yes | For SEC, we selected the following widely used benchmark datasets: IMDB (Maas et al. 2011), Elec (Johnson and Zhang 2015), and Rotten Tomatoes (Rotten) (Pang and Lee 2005). For the Rotten dataset, we used the Amazon Reviews dataset (Mc Auley and Leskovec 2013) as unlabeled data, following previous studies (Dai and Le 2015; Miyato, Dai, and Goodfellow 2017; Sato et al. 2018). For CAC, we used the RCV1 dataset (Lewis et al. 2004). |
| Dataset Splits | Yes | Table 1: Summary of datasets. Each value represents the number of instances contained in each dataset. ... Elec 2 22,500 2,500 25,000 200,000 ... IMDB 2 21,246 3,754 25,000 50,000 ... Rotten 2 8,636 960 1,066 7,911,684 ... RCV1 55 14,007 1,557 49,838 668,640 |
| Hardware Specification | Yes | We used identical hardware for each measurement, namely, a single NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions "cu DNN implementation" and "sentencepiece" but does not provide specific version numbers for these or any other software dependencies required for replication. |
| Experiment Setup | Yes | Table 2 summarizes the hyperparameters and network configurations of our experiments. We carefully selected the settings commonly used in the previous studies (Dai and Le 2015; Miyato, Dai, and Goodfellow 2017; Sato et al. 2018). ... Table 2: Summary of hyperparameters (includes Word Embedding Dim., Embedding Dropout Rate, LSTM Hidden State Dim., MLP Dim., Activation Function, CNN Kernel Dim., Number of IMNs, Optimization Algorithm, Mini-Batch Size, Initial Learning Rate, Fine-tune Learning Rate, Decay Rate, Baseline Max Epoch, Fine-tune Max Epoch) |