Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-group Learning for Hierarchical Groups
Authors: Samuel Deng, Daniel Hsu
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then conduct an empirical evaluation of our algorithm and find that it achieves attractive generalization properties on real datasets with hierarchical group structure. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Columbia University. Correspondence to: Samuel Deng <EMAIL>, Daniel Hsu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 MGL-Tree |
| Open Source Code | No | The paper refers to a third-party 'open-source xgboost implementation' but does not state that the code for their own methodology is open-source or provide a link to it. |
| Open Datasets | Yes | We conduct our experiments on twelve U.S. Census datasets from the Folktables package of Ding et al. (2021). |
| Dataset Splits | No | The paper mentions using 'a held-out test set of 20% of the data' but does not specify a train/validation/test split or cross-validation details for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions using 'scikit-learn' and 'xgboost' implementations but does not specify their version numbers. |
| Experiment Setup | Yes | Model Hyperparameters Logistic Regression loss = log loss, dual=False, solver=lbfgs Decision Tree criterion = log loss, max depth = {2, 4, 8} Random Forest criterion = log loss XGBoost objective = binary:logistic |