CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

Authors: Kevin Tian, Teng Zhang, James Zou

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that our joint model learns substantially better covariate-specific embeddings compared to the standard approach of learning a separate embedding for each covariate using only the relevant subset of data, as well as other related methods. We empirically evaluate the benefits of our algorithm on datasets, and demonstrate how it can be used to address many natural questions about covariate effects.
Researcher Affiliation Academia 1Department of Computer Science, Stanford University 2Department of Management Science and Engineering, Stanford University 3Department of Biomedical Data Science Stanford University.
Pseudocode No The paper describes the 'Objective Function and Discussion' and 'Algorithm Details' in prose, but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code Yes Accompanying code to this paper can be found at http://github.com/kjtian/Co Ve R.
Open Datasets No The paper describes the 'book dataset' and the 'politics dataset' used in experiments but does not provide concrete access information (e.g., a specific URL, DOI, or a formal citation with authors and year to a publicly available source) for these datasets.
Dataset Splits No The paper states 'individual books contained between 26747 and 355814 words' and 'The vocabulary size was 5,020' but does not specify the training, validation, and test dataset splits using percentages, sample counts, or by referencing predefined standard splits. It also mentions 'tuning our algorithm' which might imply a validation set, but no explicit details are given.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU specifications, or cloud computing resources used for running the experiments.
Software Dependencies No The paper mentions using the 'Adam (Kingma & Ba, 2014) algorithm' and refers to 'GloVe (Pennington et al., 2014)' but does not provide specific version numbers for any software libraries, dependencies, or programming languages used.
Experiment Setup Yes The vocabulary size was 5,020, and after tuning our algorithm to embed this dataset, we used 100 dimensions and a learning rate of 10^-5. The embedding used 200 dimensions and a learning rate of 10^-5.