Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Budgeted Prediction with Expert Advice
Authors: Kareem Amin, Satyen Kale, Gerald Tesauro, Deepak Turaga
AAAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the ο¬rst part of this paper, we prove our main theoretical results, stated below. We then demonstrate our algorithm s performance on real data in a setting emulating the aforementioned motivation for this work. |
| Researcher Affiliation | Collaboration | Kareem Amin University of Pennsylvania, Philadelphia, PA EMAIL Satyen Kale Yahoo! Labs, New York, NY EMAIL Gerald Tesauro Deepak Turaga IBM Research, Yorktown Heights, NY EMAIL |
| Pseudocode | Yes | Algorithm 1 Budgeted Experts Algorithm (BEXP). |
| Open Source Code | No | The paper provides links to third-party software (libsvm, Vowpal Wabbit) used, but does not state that the authors' own implementation code for the described methodology (BEXP, BEXP-AVG) is open-source or provide a link for it. |
| Open Datasets | Yes | We will use the year-prediction task associated with the Million Song Dataset (MSD). Each example in the MSD corresponds to a song released between 1922 and 2011. We use the same features as (Bertin-Mahieux et al. 2011) 90 acoustical features representing the timbre of a song. |
| Dataset Splits | No | The paper mentions training on 'a set of 46,382 examples of the MSD' and testing on 'a sequence of T = 5,153 examples arriving online' but does not specify a separate validation split or explicit percentages/counts for train/validation/test. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'MATLAB', 'libsvm', and 'Vowpal Wabbit' as software used, but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | We used 3 families of models trained with different parameter settings, for a total of 29 different experts. The models were all trained on a set of 46, 382 examples of the MSD. All labels in the dataset were in the range 1929 to 2010. We normalized the labels by subtracting 1900 and dividing by 100 so that they were in the range [0, 1.1]... We used absolute loss to measure the performance of the experts... |