Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Representations without Compositional Assumptions
Authors: Tennison Liu, Jeroen Berrevoets, Zhaozhi Qian, Mihaela Van Der Schaar
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Empirical Investigations Having introduced the challenges of learning from multiview data ITW and our proposed method to address it, we now turn to quantitatively evaluating our method: 1. Learning ITW: What is the problem? Section 5.1 employs a simulation of ITW multi-view data to probe the performances of different compositional assumptions. ... 3. Performance: Does it work? Section 5.2 evaluates downstream performance of our method against state-ofthe-art benchmarks on real world dataset. 4. Gains: Why does it work? We deconstruct our method to investigate its sources of performance gain. |
| Researcher Affiliation | Academia | 1DAMTP, University of Cambridge, Cambridge, UK 2Alan Turing Institute, London, UK. Correspondence to: Tennison Liu <EMAIL>. |
| Pseudocode | No | The paper describes its method using textual descriptions and mathematical equations, but it does not include any explicit pseudocode blocks or algorithm figures. |
| Open Source Code | Yes | Our implementation can be found at https://github.com/tennisonliu/ LEGATO and at the wider lab repository https:// github.com/vanderschaarlab/LEGATO. |
| Open Datasets | Yes | We evaluate our method on three real-world datasets. TCGA (Tomczak et al., 2015)... UK Biobank (Sudlow et al., 2015)... UCI-MFS (van Breukelen et al., 1998)... For constructing multiple views and labels, the following datasets were downloaded from http://gdac.broadinstitute.org: ... We used data from the UK Biobank (Sudlow et al., 2015)... The lung cancer dataset is extracted from UK Biobank using the scripts provided in https://github.com/callta/synthetic-data-analyses/tree/main/code... |
| Dataset Splits | Yes | All models are implemented in Py Torch. The data is split 60-20-20 into an unlabeled training set, labeled training set, and test set respectively, and all reported results are averaged over 10 runs, where different data splits are sampled for each run. |
| Hardware Specification | Yes | All experiments are run on an NVIDIA Tesla K40C GPU. |
| Software Dependencies | No | The paper states 'All models are implemented in Py Torch' but does not specify the version number of PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For all experiments, we use batch size of 64, but tune the learning rate η {0.001, 0.01, 0.1} and weight decay {0.001, 0.01, 0.1}. ... Additionally, we employ early stopping to terminate model training after 20 epochs of no improvement on the validation set, after which the best model is returned for evaluation. |