Learning Representations without Compositional Assumptions
Authors: Tennison Liu, Jeroen Berrevoets, Zhaozhi Qian, Mihaela Van Der Schaar
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Empirical Investigations Having introduced the challenges of learning from multiview data ITW and our proposed method to address it, we now turn to quantitatively evaluating our method: 1. Learning ITW: What is the problem? Section 5.1 employs a simulation of ITW multi-view data to probe the performances of different compositional assumptions. ... 3. Performance: Does it work? Section 5.2 evaluates downstream performance of our method against state-ofthe-art benchmarks on real world dataset. 4. Gains: Why does it work? We deconstruct our method to investigate its sources of performance gain. |
| Researcher Affiliation | Academia | 1DAMTP, University of Cambridge, Cambridge, UK 2Alan Turing Institute, London, UK. Correspondence to: Tennison Liu <tl522@cam.ac.uk>. |
| Pseudocode | No | The paper describes its method using textual descriptions and mathematical equations, but it does not include any explicit pseudocode blocks or algorithm figures. |
| Open Source Code | Yes | Our implementation can be found at https://github.com/tennisonliu/ LEGATO and at the wider lab repository https:// github.com/vanderschaarlab/LEGATO. |
| Open Datasets | Yes | We evaluate our method on three real-world datasets. TCGA (Tomczak et al., 2015)... UK Biobank (Sudlow et al., 2015)... UCI-MFS (van Breukelen et al., 1998)... For constructing multiple views and labels, the following datasets were downloaded from http://gdac.broadinstitute.org: ... We used data from the UK Biobank (Sudlow et al., 2015)... The lung cancer dataset is extracted from UK Biobank using the scripts provided in https://github.com/callta/synthetic-data-analyses/tree/main/code... |
| Dataset Splits | Yes | All models are implemented in Py Torch. The data is split 60-20-20 into an unlabeled training set, labeled training set, and test set respectively, and all reported results are averaged over 10 runs, where different data splits are sampled for each run. |
| Hardware Specification | Yes | All experiments are run on an NVIDIA Tesla K40C GPU. |
| Software Dependencies | No | The paper states 'All models are implemented in Py Torch' but does not specify the version number of PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For all experiments, we use batch size of 64, but tune the learning rate η {0.001, 0.01, 0.1} and weight decay {0.001, 0.01, 0.1}. ... Additionally, we employ early stopping to terminate model training after 20 epochs of no improvement on the validation set, after which the best model is returned for evaluation. |