Diet Networks: Thin Parameters for Fat Genomics
Authors: Adriana Romero, Pierre Luc Carrier, Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie, Marc-André Legault, Marie-Pierre Dubé, Julie G. Hussin, Yoshua Bengio
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier. |
| Researcher Affiliation | Academia | Adriana Romero , Pierre Luc Carrier , Akram Erraqabi, Tristan Sylvain, Alex Auvolat, Etienne Dejoie Montreal Institute for Learning Algorithms Montreal, Quebec, Canada first Name.last Name@umontreal.ca, except adriana.romero.soriano@umontreal.ca and pierre-luc.carrier@umontreal.ca Marc-Andr e Legault1, Marie-Pierre Dub e1,2,3 1University of Montreal, Faculty of Medicine 2Montreal Heart Institute, 3Beaulieu-Saucier Pharmacogenomics Centre Montreal, Quebec, Canada marc-andre.legault.1@umontreal.ca marie-pierre.dube@umontreal.ca Julie G. Hussin Wellcome Trust Centre for Human Genetics University of Oxford Oxford, UK julieh@molbiol.ox.ac.uk Yoshua Bengio Montreal Institute for Learning Algorithms Montreal, Quebec, Canada yoshua.umontreal@gmail.com |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce the experiments can be found here: https://github.com/adri-romsor/Diet Networks |
| Open Datasets | Yes | We evaluate our method on a publicly available dataset for ancestry prediction, the 1000 Genomes dataset1, that best represents the human population diversity. 1http://www.internationalgenome.org/ |
| Dataset Splits | Yes | We split the data into 5 folds of equal size. A single fold is retained for test, whereas three of the remaining folds are used as training data and the final fold is used as validation data. We repeated the process 5 times (one per fold) and report the means and standard deviations of results on the different test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | Yes | The authors would like to thank the developers of Theano Theano Development Team (2016) and Lasagne Lasagne (2016). |
| Experiment Setup | Yes | We designed a basic architecture with 2 hidden layers followed by a softmax layer to perform ancestry prediction. All hidden layers have 100 units. All models were trained by means of stochastic gradient descent with adaptive learning rate (Tieleman & Hinton, 2012), both for γ = 0 and γ = 10, using dropout, limiting the norm of the weights to 1 and/or applying weight decay to reduce overfitting. |