Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
How Do You Want Your Greedy: Simultaneous or Repeated?
Authors: Moran Feldman, Christopher Harshaw, Amin Karbasi
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we test these algorithms on real datasets. |
| Researcher Affiliation | Academia | Moran Feldman EMAIL Department of Computer Science University of Haifa Haifa, Israel Christopher Harshaw EMAIL Simons Institute Univeristy of California, Berkeley Berkeley, CA, USA Amin Karbasi EMAIL Departments of Electrical Engeering, Computer Science, Statsitics & Data Science Yale University New Haven, CT, USA |
| Pseudocode | Yes | Algorithm 1: Simultaneous Greedys (N, f, I, ℓ) |
| Open Source Code | Yes | We also present Submodular Greedy.jl, a Julia package which implements these algorithms. Our final main contribution in this paper is Submodular Greedy.jl, an open source Julia package which implements the simultaneous and repeated greedy algorithms described here, along with their nearly linear time and knapsack variants. We have written the package so that it is easy to use out-of-the-box , requiring little to no knowledge of the algorithmic variants such as marginal gain thresholding and density ratio thresholding. The package is available at this URL5 https://github.com/crharshaw/Submodular Greedy.jl and the installation requires only one line of code using the Julia package manager. |
| Open Datasets | Yes | In our experiments, we use data from the Movie Lens 20M Dataset, which features 20 million ratings of 27,000 movies by 138,000 users. For each movie, we construct a corresponding feature vector vi by using a low-rank matrix completion technique on the user reviews, as proposed by Lindgren et al. (2015). |
| Dataset Splits | No | The paper describes how the dataset is used to create problem instances based on genre limits and rating budgets but does not specify any explicit training/test/validation splits for the data itself. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | Our final main contribution in this paper is Submodular Greedy.jl, an open source Julia package which implements the simultaneous and repeated greedy algorithms described here, along with their nearly linear time and knapsack variants. |
| Experiment Setup | Yes | We define the genre-limiting constraint set (I, N), where S I if |{e S : g Ge}| dg for each genre g G. We choose the genre fraction limits of most genres to reflect the total fraction of movies belonging to the genre, i.e., qg = |{e N : g Ge}|/n. The exceptions are Crime, Drama and Thriller which have slightly higher genre fraction limits and Animation, Children, Romance, and Horror which have slightly lower genre fraction limits. where λ [0, 1] is a user-defined penalty term. For both Density Search SGS and Density Search RG, we set the number of solutions to ℓ= 2 and we use values δ = ε {0.1, 0.01}. |