Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning Taxonomy Adaptation in Large-scale Classification
Authors: Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih-Reza Amini, Cécile Amblard
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We finally illustrate the theoretical developments through several experiments conducted on two widely used taxonomies. |
| Researcher Affiliation | Collaboration | Rohit Babbar EMAIL Max-Planck Institute for Intelligent Systems T ubingen, Germany Ioannis Partalas EMAIL Viseo Research Center Grenoble, France Eric Gaussier Massih-Reza Amini C ecile Amblard EMAIL LIG, Universit e Grenoble Alpes CNRS Grenoble, cedex 9, France, 38041 |
| Pseudocode | Yes | Algorithm 1 Hierarchy pruning based on validation estimate Algorithm 2 The proposed method for hierarchy pruning based on Generalization Bound Algorithm 3 The pruning strategy. |
| Open Source Code | No | The paper mentions using third-party open-source tools like "Lib Linear library" and "Vowpal Wabbit open source system" but does not provide any specific links or statements about making their own implementation code available for the methodology described in this paper. |
| Open Datasets | Yes | The datasets we used in these experiments are two large datasets extracted from the International Patent Classification (IPC) dataset3 and the publicly available DMOZ dataset from the second LSHTC challenge (LSHTC2)4. Footnote 3: http://www.wipo.int/classifications/ipc/en/support/ Footnote 4: http://lshtc.iit.demokritos.gr/ |
| Dataset Splits | Yes | Table 2: Datasets used in our experiments along with the properties: number of training examples, test examples, classes # Tr. # Test # Classes # Feat. Depth CR Error ratio LSHTC2-1 25,310 6,441 1,789 145,859 6 0.008 1.24 LSHTC2-2 50,558 13,057 4,787 271,557 6 0.003 1.32 LSHTC2-3 38,725 10,102 3,956 145,354 6 0.004 2.65 LSHTC2-4 27,924 7,026 2,544 123,953 6 0.005 1.8 LSHTC2-5 68,367 17,561 7,212 192,259 6 0.002 2.12 IPC 46,324 28,926 451 1,123,497 4 0.02 12.27 |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions software like "Lib Linear library", "Ada Boost with random forest", "Vowpal Wabbit open source system" but does not provide specific version numbers for any of these components. |
| Experiment Setup | Yes | For the binary classifiers, we used Ada Boost with random forest as a base classifier, setting the number of trees to 20, 50 and 50 for the MNB, MLR and SVM classifiers respectively and leaving the other parameters at their default values. Several values have been tested for the number of trees ({10, 20, 50, 100 and 200}), the depth of the trees ({unrestricted, 5, 10, 15, 30, 60}), as well as the number of iterations in Ada Boost ({10, 20, 30}). For MLR and SVM, we use the Lib Linear library (Fan et al., 2008) and use squared hinge-loss with L2-regularized versions, setting the penalty parameter C by cross-validation. For both methods we experiment with hinge and logistic loss functions using different step sizes ({0.15, 0.25, 0.5, 0.75, 1, 2, 4, 8}) and up to 32 passes through the data. |