Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
C-Mixup: Improving Generalization in Regression
Authors: Huaxiu Yao, Yiping Wang, Linjun Zhang, James Y. Zou, Chelsea Finn
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate C-Mixup on eleven datasets, ranging from tabular to video data. Compared to the best prior approach, C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively. |
| Researcher Affiliation | Academia | 1Stanford University, 2Zhejiang University, 3Rutgers University |
| Pseudocode | Yes | Algorithm 1 Training with C-Mixup |
| Open Source Code | Yes | Code is released at https://github.com/huaxiuyao/C-Mixup. |
| Open Datasets | Yes | We use the following five datasets to evaluate the performance of in-distribution generalization (see Appendix C.1 for detailed data statistics). (1)&(2) Airfoil Self-Noise (Airfoil) and NO2 [35] are both are tabular datasets... (3)&(4): Exchange-Rate, and Electricity [40] are two time-series datasets... (5) Echocardiogram Videos (Echo) [50] is a ejection fraction prediction dataset... |
| Dataset Splits | Yes | In our experiments, we apply cross-validation to tune all hyperparameters with grid search. ... For Airfoil and NO2, we split the data into 80% training and 20% test sets, without a validation set, as common practice for these datasets. ... For Echo, we use the default training/validation/testing split provided by the dataset. |
| Hardware Specification | Yes | We train all models on a single Nvidia Tesla A100 GPU. |
| Software Dependencies | No | The paper mentions various models and architectures but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For all models, we use the Adam optimizer [40] with a learning rate of 1e-3 and a batch size of 256. We train for 200 epochs, and apply early stopping with patience 20. For mixup variants, we set α = 1.0. For C-Mixup, we set σ = 0.5. |