Extracting Action Sequences from Texts Based on Deep Reinforcement Learning

Authors: Wenfeng Feng, Hankz Hankui Zhuo, Subbarao Kambhampati

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments on three datasets, i.e., Microsoft Windows Help and Support (WHS) documents [Branavan et al., 2009], and two datasets collected from Wiki How Home and Garden (WHG) and Cooking Tutorial (CT). Details are presented in Table 1. and We compare EASDRL to the best offline-trained baseline BLCC-2. Figure 5 shows the results of online training, where online collected texts indicates the number of texts on which humans provide feedbacks. We can see that EASDRL outperforms BLCC-2 significantly, which demonstrates the effectiveness of our reinforcement learning framework.
Researcher Affiliation Academia Wenfeng Feng1, Hankz Hankui Zhuo1 , Subbarao Kambhampati2 1 School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China 2 Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, US fengwf@mail2.sysu.edu.cn, zhuohank@mail.sysu.edu.cn, rao@asu.edu
Pseudocode Yes Algorithm 1 Our EASDRL algorithm
Open Source Code No The paper does not provide any direct links or explicit statements about the public availability of its source code.
Open Datasets Yes We conducted experiments on three datasets, i.e., Microsoft Windows Help and Support (WHS) documents [Branavan et al., 2009], and two datasets collected from Wiki How Home and Garden 3 (WHG) and Cooking Tutorial 4 (CT). Details are presented in Table 1. with footnotes 3https://www.wikihow.com/Category:Home-and-Garden and 4http://cookingtutorials.com/.
Dataset Splits Yes We randomly split each dataset into 10 folds, calculated an average of performance over 10 runs via 10-fold cross validation, and used the F1 metric to validate the performance in our experiments.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, memory, or specific computing environments used for the experiments.
Software Dependencies No The paper mentions using CNNs and the adam optimizer, but does not provide specific version numbers for software libraries or dependencies (e.g., 'PyTorch 1.9' or 'TensorFlow 2.x').
Experiment Setup Yes We set the input dimension to be (500 100) for action names and (100 150) for action arguments, the number of featuremaps to be 32. We used 0.25 dropout on the concatenated max pooling outputs and exploited a 256 dimensional fully-connected layer before the final two dimensional outputs. We set the replay memory Ω= 100000, discount factor γ = 0.9. We varied ρ from 0.05 to 0.95 with the interval of 0.05 and found the best value is 0.80 (that is why we set ρ = 0.80 in the experiment). We set δ = 0.10 for action names, δ = 0.07 for arguments according to Table 1, the constant c = 50, learning rate of adam to be 0.001, probability ϵ for ϵ-greedy decreasing from 1 to 0.1 over 1000 training steps.