Jumper: Learning When to Make Classification Decision in Reading

Authors: Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, Sen Song

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated JUMPER on three tasks, including two benchmark datasets and one real, industrial application. We show the performance of both ultimate classification and jumping positions; we will also have deep analysis into our model.
Researcher Affiliation Collaboration 1Department of Biomedical Engineering, IDG/Mc Govern Institute for Brain Research, Tsinghua University 2Adept Mind.ai 3Deeply Curious.ai 4Laboratory of Brain and Intelligence, Tsinghua University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Both code and the Occupational Injury dataset are available at: https://github.com/jumper-data
Open Datasets Yes Movie Review (MR), whose objective is a binary sentiment classification (positive vs. negative) for movie reviews [Pang and Lee, 2004]; it is widely used as a sentence classification task. AG news corpus (AG), which is a collection of more than one million news articles, and we followed Zhang et al. [2015]... Occupational Injury (OI).2 The task information extraction of occupational injury originates from a real industrial application in the legal domain... Both code and the Occupational Injury dataset are available at: https://github.com/jumper-data
Dataset Splits Yes We did not perform any dataset-specific tuning except early stopping on the development sets. For AG, which does not have a standard split, we randomly selected 5% of the training data as the development set.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions several components like CNN, GRU, GloVe vectors, and Ada Delta, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes In our model and baselines, the CNN part used rectified linear units (Re LU) as the activation function, filter windows with sizes 1 to 5, 200 feature maps for each filter, and a dropout rate of 0.5; GRU had a hidden size of 20. We reimplemented the self-attentive model using the same hyperparameters as in Lin et al. [2017]. For reinforcement learning, the intermediate reward r was 0.05, discounting rate γ was 0.9, and the exploration rate ϵ was 0.1. In addition, word embeddings for all of the models were initialized with 300d Glo Ve vectors [Pennington et al., 2014] and fine-tuned during training to improve the performance. The other parameters were initialized by randomly sampling from the uniform distribution in [ 0.01, 0.01]. For all the models, we used Ada Delta with a learning rate of 0.1 and a batch size of 50.