Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Non-linear, Sparse Dimensionality Reduction via Path Lasso Penalized Autoencoders
Authors: Oskar Allerbo, Rebecka Jörnsten
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the path lasso regularized autoencoder to PCA, sparse PCA, autoencoders and sparse autoencoders on real and simulated data sets. We show that the algorithm exhibits much lower reconstruction errors than sparse PCA and parameter-wise lasso regularized autoencoders for low-dimensional representations. Moreover, path lasso representations provide a more accurate reconstruction match, i.e. preserved relative distance between objects in the original and reconstructed spaces. [...] 3. Experiments In order to evaluate path lasso for dimensionality reduction we applied it to three different data sets, one with synthetic data, consisting of Gaussian clusters on a hypercube, one with text documents from newsgroup posts, and one with images of faces. |
| Researcher Affiliation | Academia | Oskar Allerbo EMAIL Mathematical Sciences University of Gothenburg and Chalmers University of Technology SE-412 96 Gothenburg, Sweden Rebecka J ornsten EMAIL Mathematical Sciences University of Gothenburg and Chalmers University of Technology SE-412 96 Gothenburg, Sweden |
| Pseudocode | Yes | Algorithm 1 Proximal Path Lasso Optimization Step Input: Parameters at time t, {(Wl)t, (bl)t}L l=l; data, x; learning rate, α; regularization strength, λ. Output: Parameters at time t + 1: {(Wl)t+1, (bl)t+1}L l=l. |
| Open Source Code | Yes | Code is available at https://github.com/allerbo/path_lasso. |
| Open Datasets | Yes | To test the algorithm on text data, the 20 newsgroups data set1 was used. Out of the original 20 categories, the following 4 were selected: soc.religion.christian, sci.space, comp.windows.x and rec.sport.hockey, which resulted in 31225 documents. Then, for 1. Available at http://qwone.com/~jason/20Newsgroups/. We also tested the algorithm on the AT&T face database (Samaria and Harter, 1994), which contains 400 grayscale images of faces. |
| Dataset Splits | Yes | In each experiment, 20 % of the data was set aside for testing and the remaining 80 % was split 90-10 into training and validation data; all visualizations were made using the testing data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. It mentions using 'Adam optimizer' but no hardware specifications like CPU or GPU models. |
| Software Dependencies | No | The paper mentions using a 'modified version of the solver in python s scikit-learn module (Pedregosa et al., 2011)', but does not provide a specific version number for scikit-learn or Python itself. No other specific software dependencies with version numbers are listed. |
| Experiment Setup | Yes | All autoencoders used one hidden layer with tanh activations in the encoder and decoder respectively, and were trained with l2-loss. For optimization stages not using proximal gradient descent, the Adam optimizer (Kingma and Ba, 2014) was used. [...] The four dimensional data set was reduced down to two dimensions using the six different algorithms. For the four autoencoder based algorithms, the number of nodes in the five layers of the autoencoder were 4, 50, 2, 50 and 4, respectively. [...] For the autoencoder based algorithms, the layer widths were 100, 50, 2 (4, 25), 50 and 100 nodes. [...] Both the encoder and the decoder had 1000 units wide hidden layers, and to assure that a pixel that was disconnected from all the latent dimensions got a value of zero, no bias parameters were used. [...] Throughout this paper, γ = 2 was used. |