Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Convolutional Neural Networks for Text Hashing
Authors: Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao
IJCAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show the superiority of our proposed approach over several state-of-the-art hashing methods when tested on one short text dataset as well as one normal text dataset. |
| Researcher Affiliation | Academia | Institute of Automation, Chinese Academy of Sciences. 100190, Beijing, P.R. China National Laboratory of Pattern Recognition (NLPR), Beijing, P.R. China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper includes mathematical equations and descriptions of the algorithm steps but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any links to its own source code nor explicitly states that the code for their method is open-sourced or available. |
| Open Datasets | Yes | We test our algorithms on two public text datasets... Search Snippets1. This dataset was selected from the results of web search transaction using predefined phrases of 8 different domains [Phan et al., 2008]. ... 20Newsgroups. We select the popular bydata version and use the stemmed version2 pre-processed by Ana Cardoso Cachopo [2007]... 1http://jwebpro.sourceforge.net/data-web-snippets.tar.gz. 2http://web.ist.utl.pt/acardoso/datasets/. ... By default, our experiments ultilize the Glo Ve embeddings3 trained by Pennington et al. [2014] on 6 billion tokens of Wikipedia 2014 and Gigaword 5. We also give some comparisons with other word embeddings, such as Senna embeddings4 [Collobert et al., 2011]... 3http://nlp.stanford.edu/projects/glove/. 4http://ml.nec-labs.com/senna/. |
| Dataset Splits | Yes | For these datasets, we denote the category labels as tags, generate vocabulary from the training sets and randomly select 10% of the training data as the development set. ... Dataset C Train/Test L(mean/max) |V | Snippets 8 10060/2280 17.3/38 26265 20News 20 10443/6973 92.8/300 41877 |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments. There are no mentions of specific GPU models, CPU models, or cloud computing instance types with specifications. |
| Software Dependencies | No | The paper mentions using LDA for ITQ baseline without a version number, and various word embeddings but does not list specific software dependencies with version numbers (e.g., Python X.Y, TensorFlow A.B.C). |
| Experiment Setup | Yes | The parameter k in Equation 2 is fixed to 7 when constructing the graph Laplacians in our approach, as well as in the baseline methods, STH, STH-RBF and STHs. We set the width of the convolutional filter w as 3, the size of feature map n1 as 80, the value of K in max pooling layer as 2, the dimension of word embeddings dw as 50, the dimension of position embeddings dp as 8 and the learning rate λ as 0.01. Moreover, the feature weight α at the output layer are tuned through the grid from 0.001 to 1024. The optimal weights are α = 16 on Search Snippets and α = 128 on 20Newsgroups. |