Nonoverlap-Promoting Variable Selection
Authors: Pengtao Xie, Hongbao Zhang, Yichen Zhu, Eric Xing
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both simulation studies and real-world datasets demonstrate the effectiveness of the proposed regularizer in selecting less-overlapped variables and improving generalization performance. |
| Researcher Affiliation | Collaboration | 1Petuum Inc 2School of Computer Science, Carnegie Mellon University 3School of Mathematical Sciences, Peking University. Correspondence to: Pengtao Xie <pengtao.xie@petuum.com>, Eric P. Xing <eric.xing@petuum.com>. |
| Pseudocode | Yes | Algorithm 1 Algorithm for solving the LDD-L1-SC problem |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own open-source code. |
| Open Datasets | Yes | The simulation study is performed on the multiclass logistic regression model. ... We split the dataset into train/validation/test set with 600/200/200 examples respectively. ... The SC experiments were conducted on two text datasets: 20Newsgroups1 (20-News) and Reuters Corpus2 Volume 1 (RCV1). ... We apply LSTM networks (Hochreiter & Schmidhuber, 1997) to learn language models on the Penn Treebank (PTB) dataset (Marcus et al., 1993)... The CNN experiments were performed on the CIFAR-10 dataset5. |
| Dataset Splits | Yes | We split the dataset into train/validation/test set with 600/200/200 examples respectively. ... For 20-News, 11314, 3766 and 3766 documents were used for training, validation and testing respectively. ... For RCV1, the number of training, validation and testing documents are 5775, 1925, 1925 respectively. ... The Penn Treebank (PTB) dataset ... consists of 923K training, 73K validation, and 82K test words. ... CIFAR-10 dataset ... 50,000 images were used for training and 10,000 for testing. 5000 training images were used as the validation set for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud instance types. |
| Software Dependencies | No | The paper mentions software like "Pytorch LM" but does not specify exact version numbers for any libraries, frameworks, or other software dependencies necessary for replication. |
| Experiment Setup | Yes | For 20-News, the number of basis vectors in LDD-L1-SC is set to 50. λ1, λ2, λ3 and λ4 are set to 1, 1, 0.1 and 0.001 respectively. For RCV1, the number of basis vectors is set to 200. λ1, λ2, λ3 and λ4 are set to 0.01, 1, 1 and 1 respectively. ... The number of hidden layers is set to 2. The embedding size is 1500. The size of hidden state is 1500. ... The number of training epochs is 40. Dropout with 0.65 is used. The initial learning rate is 20. Gradient clipping threshold is 0.25. The size of mini-batch is 20. ... The depth and width are set to 28 and 10 respectively. The networks are trained using SGD, where the epoch number is 200, the learning rate is set to 0.1 initially and is dropped by 0.2 at 60, 120 and 160 epochs, the minibatch size is 128 and the Nesterov momentum is 0.9. The dropout probability is 0.3 and the L2 weight decay is 0.0005. |