Jumper: Learning When to Make Classification Decision in Reading
Authors: Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, Sen Song
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated JUMPER on three tasks, including two benchmark datasets and one real, industrial application. We show the performance of both ultimate classification and jumping positions; we will also have deep analysis into our model. |
| Researcher Affiliation | Collaboration | 1Department of Biomedical Engineering, IDG/Mc Govern Institute for Brain Research, Tsinghua University 2Adept Mind.ai 3Deeply Curious.ai 4Laboratory of Brain and Intelligence, Tsinghua University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Both code and the Occupational Injury dataset are available at: https://github.com/jumper-data |
| Open Datasets | Yes | Movie Review (MR), whose objective is a binary sentiment classification (positive vs. negative) for movie reviews [Pang and Lee, 2004]; it is widely used as a sentence classification task. AG news corpus (AG), which is a collection of more than one million news articles, and we followed Zhang et al. [2015]... Occupational Injury (OI).2 The task information extraction of occupational injury originates from a real industrial application in the legal domain... Both code and the Occupational Injury dataset are available at: https://github.com/jumper-data |
| Dataset Splits | Yes | We did not perform any dataset-specific tuning except early stopping on the development sets. For AG, which does not have a standard split, we randomly selected 5% of the training data as the development set. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions several components like CNN, GRU, GloVe vectors, and Ada Delta, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | In our model and baselines, the CNN part used rectified linear units (Re LU) as the activation function, filter windows with sizes 1 to 5, 200 feature maps for each filter, and a dropout rate of 0.5; GRU had a hidden size of 20. We reimplemented the self-attentive model using the same hyperparameters as in Lin et al. [2017]. For reinforcement learning, the intermediate reward r was 0.05, discounting rate γ was 0.9, and the exploration rate ϵ was 0.1. In addition, word embeddings for all of the models were initialized with 300d Glo Ve vectors [Pennington et al., 2014] and fine-tuned during training to improve the performance. The other parameters were initialized by randomly sampling from the uniform distribution in [ 0.01, 0.01]. For all the models, we used Ada Delta with a learning rate of 0.1 and a batch size of 50. |