reproducibilityindex.ai

Representer Point Selection for Explaining Regularized High-dimensional Models

Authors: Che-Ping Tsai, Jiong Zhang, Hsiang-Fu Yu, Eli Chien, Cho-Jui Hsieh, Pradeep Kumar Ravikumar

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets. We also showcase the utility of high-dimensional representers in explaining model recommendations.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Amazon, USA 3University of Illinois Urbana-Champaign 4University of California, Los Angeles.
Pseudocode	Yes	Algorithm 1 Computation of high-dimensional representers for Collaborative Filtering
Open Source Code	No	The paper does not include a direct link to open-source code for the described methodology or an explicit statement that the code is being released publicly.
Open Datasets	Yes	We use the following three datasets on binary classification. (1) 20 newsgroups1 [...] (2) Gisette (Guyon et al., 2004) [...] (3) Rcv1 (Lewis et al., 2004) [...]. Datasets: (1) Movielens-1M (Harper & Konstan, 2015): [...] (2) Amazon review (2018) (Ni et al., 2019):
Dataset Splits	Yes	We randomly split 10% data for the test set. [...] It contains 6,000/1,000 samples with each containing 5,000 features for training/testing. [...] For every user, we randomly held out two items ratings to construct the validation and test sets.
Hardware Specification	No	The paper mentions runtime 'on a single CPU' but does not provide specific details such as the CPU model, number of cores, or other hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions 'LIBLINEAR (Fan et al., 2008)' but does not provide specific version numbers for this or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	We set max iterations to 20 and embedding dimension to 12 on the Movie Lens-1M dataset. [...] We use SGD optimizer with learning rate 2.0/15.0 with batch size 3000/3000 to train MF model for 10/10 epochs [...] For Movie Lens-1M/Amazon reviews 2018, we use Adam optimizer with learning rate 0.001/0.001 with batch size 3000/3000 to train Youtube Net for 20/10 epochs. We use an embedding of 64/16 trainable parameters to model user and item information. The user feature encoder consists of 4/3 layers of size 64/16 with 0.2/0.2 dropout probabilities.