Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization

Authors: Alessandro Sordoni, Yoshua Bengio, Jian-Yun Nie

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental study All our experiments were conducted using the open source Indri search engine (http://www.lemurproject.org). ... We choose to use the three set of topics of the TREC Web Track from 2010 to 2012 (topics 51-200). In addition to MAP, precision at topranks is an important feature for query expansion models. Hence, we also report NDCG@10 and the recent ERR@10... The statistical significance of differences in the performance of tested methods is determined using a randomization test (Smucker, Allan, and Carterette 2007) evaluated at α < 0.05. ... Results Table 3 resumes all our experimental results.
Researcher Affiliation Academia Alessandro Sordoni, Yoshua Bengio and Jian-Yun Nie DIRO, Universit e de Montr eal Montr eal, Qu ebec
Pseudocode Yes Figure 1: Algorithms for training (a) and testing (b) the hyper parameters Φ of the expansion models directly on MAP. (a) Training Phase Q Train queries For t = 1 . . . n 1. Φt Random(ΩΦ) 2. Mt Train(D, Φt) 3. QE Expand(Q, Mt) 4. λt Grid(QE, λ) 5. MAPΦt Search(QE, λt) 6. If MAPΦt MAPΦ 5.1 Φ = Φt, λ = λt Return Φ , λ (a) Testing Phase Q Test queries 1. M Train(D, Φ ) 2. QE Expand(Q, M ) 3. MAPΦ Search(QE, λ ) 4. Return MAPΦ
Open Source Code No All our experiments were conducted using the open source Indri search engine (http://www.lemurproject.org).
Open Datasets Yes We test the effectiveness of our approach on the Clue Web09B collection, a noisy web collection containing 50,220,423 documents. ... For this paper, we built the anchor log from the high-quality Wikipedia collection (http://www.wikipedia.org).
Dataset Splits Yes We report the results obtained by performing 5-fold cross-validation.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory, or cluster specifications) were found.
Software Dependencies No All our experiments were conducted using the open source Indri search engine (http://www.lemurproject.org).
Experiment Setup Yes For all the embeddings model, we fix the number of latent dimensions to K = 100, the number of epochs to 3. For SSI, we cross-validate the gradient step, while for QEM we include also the margin m. ... Our procedure is depicted in Fig. 1. Given our anchor log D, we sample hyper parameters Φ from a uniform distribution over a fine-grained set of possible values ΩΦ. Clamping Φ, we train the model parameters (embeddings or translation probabilities) on the anchor log. We expand the original queries by selecting the top-10 concepts according to the parameterization discussed previously. Finally, we tune by grid-search the smoothing parameter λ. We repeat the process n = 50 times in order to have good chances to find minima of the hyperparameter space.