Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization
Authors: Alessandro Sordoni, Yoshua Bengio, Jian-Yun Nie
AAAI 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental study All our experiments were conducted using the open source Indri search engine (http://www.lemurproject.org). ... We choose to use the three set of topics of the TREC Web Track from 2010 to 2012 (topics 51-200). In addition to MAP, precision at topranks is an important feature for query expansion models. Hence, we also report NDCG@10 and the recent ERR@10... The statistical significance of differences in the performance of tested methods is determined using a randomization test (Smucker, Allan, and Carterette 2007) evaluated at α < 0.05. ... Results Table 3 resumes all our experimental results. |
| Researcher Affiliation | Academia | Alessandro Sordoni, Yoshua Bengio and Jian-Yun Nie DIRO, Universit e de Montr eal Montr eal, Qu ebec |
| Pseudocode | Yes | Figure 1: Algorithms for training (a) and testing (b) the hyper parameters Φ of the expansion models directly on MAP. (a) Training Phase Q Train queries For t = 1 . . . n 1. Φt Random(ΩΦ) 2. Mt Train(D, Φt) 3. QE Expand(Q, Mt) 4. λt Grid(QE, λ) 5. MAPΦt Search(QE, λt) 6. If MAPΦt MAPΦ 5.1 Φ = Φt, λ = λt Return Φ , λ (a) Testing Phase Q Test queries 1. M Train(D, Φ ) 2. QE Expand(Q, M ) 3. MAPΦ Search(QE, λ ) 4. Return MAPΦ |
| Open Source Code | No | All our experiments were conducted using the open source Indri search engine (http://www.lemurproject.org). |
| Open Datasets | Yes | We test the effectiveness of our approach on the Clue Web09B collection, a noisy web collection containing 50,220,423 documents. ... For this paper, we built the anchor log from the high-quality Wikipedia collection (http://www.wikipedia.org). |
| Dataset Splits | Yes | We report the results obtained by performing 5-fold cross-validation. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory, or cluster specifications) were found. |
| Software Dependencies | No | All our experiments were conducted using the open source Indri search engine (http://www.lemurproject.org). |
| Experiment Setup | Yes | For all the embeddings model, we fix the number of latent dimensions to K = 100, the number of epochs to 3. For SSI, we cross-validate the gradient step, while for QEM we include also the margin m. ... Our procedure is depicted in Fig. 1. Given our anchor log D, we sample hyper parameters Φ from a uniform distribution over a fine-grained set of possible values ΩΦ. Clamping Φ, we train the model parameters (embeddings or translation probabilities) on the anchor log. We expand the original queries by selecting the top-10 concepts according to the parameterization discussed previously. Finally, we tune by grid-search the smoothing parameter λ. We repeat the process n = 50 times in order to have good chances to find minima of the hyperparameter space. |