Project and Probe: Sample-Efficient Adaptation by Interpolating Orthogonal Features
Authors: Annie S Chen, Yoonho Lee, Amrith Setlur, Sergey Levine, Chelsea Finn
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on four datasets, with multiple distribution shift settings for each, show that PRO2 improves performance by 5-15% when given limited target data compared to prior methods such as standard linear probing. |
| Researcher Affiliation | Collaboration | Annie S. Chen 1, Yoonho Lee 1, Amrith Setlur2, Sergey Levine3, Chelsea Finn1 Stanford University1, Carnegie Mellon University2, UC Berkeley3 |
| Pseudocode | Yes | Algorithm 1 Project and Probe |
| Open Source Code | No | No explicit statement found about releasing the source code for the methodology described in the paper or a direct link to a code repository. |
| Open Datasets | Yes | We run experiments on six datasets with distribution shifts: 4-way collages (Teney et al., 2021), Waterbirds (Sagawa et al., 2020), Celeb A (Liu et al., 2015), Camelyon (Bandi et al., 2018), Living17 (Santurkar et al., 2020), and FMo W (Koh et al., 2021) datasets. |
| Dataset Splits | Yes | For hyperparameter tuning, we adopt the typical practice of using a target validation set, which is common in prior work in similar transfer learning settings (Kirichenko et al., 2022; Mehta et al., 2022; Lee et al., 2022a). |
| Hardware Specification | No | No specific hardware details like GPU/CPU models or memory amounts were provided. It only mentions 'four standard CPUs and no GPUs'. |
| Software Dependencies | No | It mentions 'Py Torch implementation' and 'Adam W optimizer' but without specific version numbers for PyTorch or other dependencies. |
| Experiment Setup | Yes | For all comparisons, we hyperparameter tune over 3 different learning rates (0.1, 0.01, and 0.001) as well as 3 different L2 regularization weights (0.1, 0.01, 0.001). In our main experiments in Sec. 6.2, we also sweep over 6 different projection dimensions (d = 1, 4, 16, 64, 256, 1024) and report results over 10 runs. |