Partial Transportability for Domain Generalization

Authors: Kasra Jalaldoust, Alexis Bellot, Elias Bareinboim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results are corroborated with experiments. (Abstract) This section illustrates Algs. 1 and 2 for the evaluation and optimization of the generalization error on several tasks, ranging from simulated examples to semi-synthetic image datasets. (Section 5)
Researcher Affiliation Collaboration Kasra Jalaldoust Alexis Bellot : Elias Bareinboim Causal Artificial Intelligence Lab Columbia University {kasra, eb}@cs.columbia.edu, abellot95@gmail.com (First page) Equal Contribution. :Now at Google Deep Mind. (First page)
Pseudocode Yes Algorithm 1 Neural-TR (Section 4.1) Algorithm 2 CRO (Causal Robust Optimization) (Section 4.2)
Open Source Code Yes Code is provided. (NeurIPS Paper Checklist, Section 5)
Open Datasets Yes Our second experiment considers the colored MNIST (CMNIST) dataset that is used in the literature to highlight the robustness of classifiers to spurious correlations, e.g. see [2]. (Section 5.2)
Dataset Splits No The paper mentions 'We use data drawn from P 1,2pz, yq to train predictors' (Section 5.2) but does not specify how this data is split into train/validation/test sets for their experiments.
Hardware Specification Yes All experiments were executed on a Macbook Pro M2 32 GB RAM. (NeurIPS Paper Checklist, Section 8)
Software Dependencies No For the synthetic experiments, we used feed-forward neural networks... We used Adam optimizer for training the Neural networks. In CMNIST example, we used a standard implementation of a conditional GAN [23] trained over 200 epochs with a batch-size of 64. (Appendix B.3) No specific versions of libraries (e.g., PyTorch, TensorFlow) or Python are mentioned.
Experiment Setup Yes For the synthetic experiments, we used feed-forward neural networks with 7 layers and 128 ˆ 128 neurons in each layer. The activation for all layers is Re Lu, but for the last layer which is a sigmoid since fθV outputs the probability of V 1. For evaluation, at each epoch, we used 1000 samples from the joint distribution. ... The learning rate of Adam was set to 0.0002. (Appendix B.3)