Convolutional Neural Networks for Text Hashing
Authors: Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show the superiority of our proposed approach over several state-of-the-art hashing methods when tested on one short text dataset as well as one normal text dataset. |
| Researcher Affiliation | Academia | Institute of Automation, Chinese Academy of Sciences. 100190, Beijing, P.R. China National Laboratory of Pattern Recognition (NLPR), Beijing, P.R. China {jiaming.xu, peng.wang, guanhua.tian, boxu, fangyuan.wang}@ia.ac.cn, jzhao@nlpr.ia.ac.cn, hongwei.hao@ia.ac.cn |
| Pseudocode | No | The paper includes mathematical equations and descriptions of the algorithm steps but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any links to its own source code nor explicitly states that the code for their method is open-sourced or available. |
| Open Datasets | Yes | We test our algorithms on two public text datasets... Search Snippets1. This dataset was selected from the results of web search transaction using predefined phrases of 8 different domains [Phan et al., 2008]. ... 20Newsgroups. We select the popular bydata version and use the stemmed version2 pre-processed by Ana Cardoso Cachopo [2007]... 1http://jwebpro.sourceforge.net/data-web-snippets.tar.gz. 2http://web.ist.utl.pt/acardoso/datasets/. ... By default, our experiments ultilize the Glo Ve embeddings3 trained by Pennington et al. [2014] on 6 billion tokens of Wikipedia 2014 and Gigaword 5. We also give some comparisons with other word embeddings, such as Senna embeddings4 [Collobert et al., 2011]... 3http://nlp.stanford.edu/projects/glove/. 4http://ml.nec-labs.com/senna/. |
| Dataset Splits | Yes | For these datasets, we denote the category labels as tags, generate vocabulary from the training sets and randomly select 10% of the training data as the development set. ... Dataset C Train/Test L(mean/max) |V | Snippets 8 10060/2280 17.3/38 26265 20News 20 10443/6973 92.8/300 41877 |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments. There are no mentions of specific GPU models, CPU models, or cloud computing instance types with specifications. |
| Software Dependencies | No | The paper mentions using LDA for ITQ baseline without a version number, and various word embeddings but does not list specific software dependencies with version numbers (e.g., Python X.Y, TensorFlow A.B.C). |
| Experiment Setup | Yes | The parameter k in Equation 2 is fixed to 7 when constructing the graph Laplacians in our approach, as well as in the baseline methods, STH, STH-RBF and STHs. We set the width of the convolutional filter w as 3, the size of feature map n1 as 80, the value of K in max pooling layer as 2, the dimension of word embeddings dw as 50, the dimension of position embeddings dp as 8 and the learning rate λ as 0.01. Moreover, the feature weight α at the output layer are tuned through the grid from 0.001 to 1024. The optimal weights are α = 16 on Search Snippets and α = 128 on 20Newsgroups. |