ADIOS: Architectures Deep In Output Space
Authors: Moustapha Cisse, Maruan Al-Shedivat, Samy Bengio
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several popular and large multi-label datasets demonstrate that our approach not only yields significant improvements, but also helps to overcome trade-offs specific to the multi-label classification setting. |
| Researcher Affiliation | Collaboration | Moustapha Cisse MOUSTAPHACISSE@FB.COM Maruan Al-Shedivat ALSHEDIVAT@CS.CMU.EDU Samy Bengio BENGIO@GOOGLE.COM Facebook AI Research , Carnegie Mellon University , Google Brain |
| Pseudocode | Yes | Algorithm 1 Approximate MBC construction |
| Open Source Code | Yes | Our code is available at https://github.com/alshedivat/adios. |
| Open Datasets | Yes | We used three readily available datasets that are popular in the multi-label community2: Delicious (text), Media Mill (video) and NUS-WIDE (images). Additionally, we preprocessed and used two other datasets of moderate and large size: image data from SUN2012 (Xiao et al., 2010) and text data from Bio ASQ competition of 20153. |
| Dataset Splits | Yes | All the datasets were randomly split into a fixed training (60%), testing (20%) and validation sets (20%). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software like 'Keras machine learning library', 'Caffe library', and 'gensim library' but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For neural network models, we used hidden layers with 1024 Re LU units and trained the models using Adagrad with 20 to 50% dropout and batch normalization. Additionally, we used L2 regularization on the weight matrices and L1 activity regularization on the output layers when it improved performance (typical values ranged between 10 4 to 10 3 for all the datasets). |