reproducibilityindex.ai

Convolutional Neural Networks for Text Hashing

Authors: Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show the superiority of our proposed approach over several state-of-the-art hashing methods when tested on one short text dataset as well as one normal text dataset.
Researcher Affiliation	Academia	Institute of Automation, Chinese Academy of Sciences. 100190, Beijing, P.R. China National Laboratory of Pattern Recognition (NLPR), Beijing, P.R. China {jiaming.xu, peng.wang, guanhua.tian, boxu, fangyuan.wang}@ia.ac.cn, jzhao@nlpr.ia.ac.cn, hongwei.hao@ia.ac.cn
Pseudocode	No	The paper includes mathematical equations and descriptions of the algorithm steps but does not provide a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any links to its own source code nor explicitly states that the code for their method is open-sourced or available.
Open Datasets	Yes	We test our algorithms on two public text datasets... Search Snippets1. This dataset was selected from the results of web search transaction using predeﬁned phrases of 8 different domains [Phan et al., 2008]. ... 20Newsgroups. We select the popular bydata version and use the stemmed version2 pre-processed by Ana Cardoso Cachopo [2007]... 1http://jwebpro.sourceforge.net/data-web-snippets.tar.gz. 2http://web.ist.utl.pt/acardoso/datasets/. ... By default, our experiments ultilize the Glo Ve embeddings3 trained by Pennington et al. [2014] on 6 billion tokens of Wikipedia 2014 and Gigaword 5. We also give some comparisons with other word embeddings, such as Senna embeddings4 [Collobert et al., 2011]... 3http://nlp.stanford.edu/projects/glove/. 4http://ml.nec-labs.com/senna/.
Dataset Splits	Yes	For these datasets, we denote the category labels as tags, generate vocabulary from the training sets and randomly select 10% of the training data as the development set. ... Dataset C Train/Test L(mean/max) \|V \| Snippets 8 10060/2280 17.3/38 26265 20News 20 10443/6973 92.8/300 41877
Hardware Specification	No	The paper does not specify the hardware used for the experiments. There are no mentions of specific GPU models, CPU models, or cloud computing instance types with specifications.
Software Dependencies	No	The paper mentions using LDA for ITQ baseline without a version number, and various word embeddings but does not list specific software dependencies with version numbers (e.g., Python X.Y, TensorFlow A.B.C).
Experiment Setup	Yes	The parameter k in Equation 2 is ﬁxed to 7 when constructing the graph Laplacians in our approach, as well as in the baseline methods, STH, STH-RBF and STHs. We set the width of the convolutional ﬁlter w as 3, the size of feature map n1 as 80, the value of K in max pooling layer as 2, the dimension of word embeddings dw as 50, the dimension of position embeddings dp as 8 and the learning rate λ as 0.01. Moreover, the feature weight α at the output layer are tuned through the grid from 0.001 to 1024. The optimal weights are α = 16 on Search Snippets and α = 128 on 20Newsgroups.