reproducibilityindex.ai

Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction

Authors: Giulia Luise, Dimitrios Stamos, Massimiliano Pontil, Carlo Ciliberto

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on a number of learning-to-rank problems. In our experiments the proposed method signiﬁcantly outperforms all competitors, suggesting that enforcing low-rank regularization on the surrogate outputs can be beneﬁcial also in structured prediction settings.
Researcher Affiliation	Academia	1Department of Computer Science, University College London, London, UK 2Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa 3Department of Electrical and Electronic Engineering,Imperial College London, London, UK.
Pseudocode	Yes	Algorithm 1 LOW-RANK SELF LEARNING
Open Source Code	Yes	1Code at https://github.com/dstamos/LR-SELF
Open Datasets	Yes	Movielens. Movielens 100k (mk100k)2 consists of ratings... http://grouplens.org/datasets/movielens/ Jester. The Jester3 datasets consist of user ratings... http://goldberg.berkeley.edu/ jester-data/ Sushi. The Sushi4 dataset consists of ratings... http://www.kamishima.net/sushi/
Dataset Splits	Yes	We used a linear kernel on the input and for each dataset we performed parameter selection using 50% of the available ratings of each user for training, 20% for validation and the remaining for testing.
Hardware Specification	No	The paper does not mention any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using the 'Rank Lib' library and 'SVMrank' implementation for comparison methods, but it does not specify version numbers for these or for any software dependencies related to their own proposed method.
Experiment Setup	Yes	We used a linear kernel on the input and for each dataset we performed parameter selection using 50% of the available ratings of each user for training, 20% for validation and the remaining for testing. We averaged the performance across 5 trials, each time considering a different random train/validation/test split.