Adversarial Active Learning for Sequences Labeling and Generation
Authors: Yue Deng, KaWai Chen, Yilin Shen, Hongxia Jin
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our sequence-based active learning approach on two tasks including sequence labeling and sequence generation. In this part, we investigate the performances of ALISE on two sequence learning tasks including slot filling and image captioning. |
| Researcher Affiliation | Collaboration | Yue Deng1, Ka Wai Chen2, Yilin Shen1, Hongxia Jin1 1 AI Center, Samsung Research America, Mountain View, CA, USA 2 Department of Electrical and Computer Engineering, University of California, San Diego |
| Pseudocode | Yes | Algorithm 1: ALISE Learning |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the described methodology. |
| Open Datasets | Yes | This part of experiments were mainly conducted on the ATIS (Airline Travel Information Systems) dataset [Hemphill et al., 1990]. This part of active learning experiments are mainly conducted on MSCOCO dataset [Lin et al., 2014]. |
| Dataset Splits | Yes | Among all labeled training samples, we further randomly select 10% of them as validation samples. MSCOCO dataset [Lin et al., 2014], which consists of 82,783 images for training, 40,504 for validation, and 40,775 for testing. |
| Hardware Specification | Yes | The ALISE training in slot filling task (with 2700 samples) can be accomplished in just 74 seconds with 16 GPUs (Tesla K80) parallelized in optimization. |
| Software Dependencies | No | The paper mentions software components like "bi-directional LSTM", "standard LSTM decoder", "attention model", "ADAM", and "relu activation", but it does not specify any version numbers for these or underlying libraries/frameworks. |
| Experiment Setup | Yes | We choose 128 for word embedding layer and 64 hidden states for the encoder LSTM... The adversarial network D is configured by three dense-connected layers with 128 (input layer), 64 (intermediate layer) and 1 (output layer) units, respectively. The output layer is further connected with a sigmoid function for probabilistic conversion. We use relu activation among all other layers. Each token of the output sequence is coded as a one-hot vector with the hot entry indicating the underlying cateogory of the token. The whole deep learning system was trained by ADAM [Kingma and Ba, 2014]. |