Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The DFS Fused Lasso: Linear-Time Denoising over General Graphs

Authors: Oscar Hernan Madrid Padilla, James Sharpnack, James G. Scott, Ryan J. Tibshirani

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 3, we derive MSE rates for the DFS fused lasso, and the fused lasso over the original graph G under consideration, for signals of bounded variation. We also derive lower bounds for the minimax MSE over trees. In Section 4, we proceed similarly but for signals in the bounded differences class. In Section 5, we present numerical experiments.
Researcher Affiliation Academia Oscar Hernan Madrid Padilla EMAIL Department of Statistics University of California Berkeley, CA 94720, USA. James Sharpnack EMAIL Department of Statistics University of California Davis, CA 95616, USA. James G. Scott EMAIL Department of Information, Risk, and Operations Management; Department of Statistics and Data Sciences University of Texas Austin, TX 78712, USA. Ryan J. Tibshirani EMAIL Machine Learning Department; Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213, USA.
Pseudocode No No structured pseudocode or algorithm blocks are explicitly presented in the paper. Algorithms like DFS and 1d fused lasso are described textually within the main body and appendices.
Open Source Code No The paper mentions using existing R packages (igraph, glmgen, Matrix) and Matlab packages (proxTV, Math BGL), which are third-party tools. For example: "For the DFS fused lasso, in each problem instance, we first computed a DFS ordering using the dfs function from the R package igraph, which is an R wrapper for a C++ implementation of DFS... We then computed the appropriate 1d fused lasso solutions using the trendfilter function from the R package glmgen...". There is no explicit statement or link indicating that the authors have released their own code for the specific methodology developed in this paper.
Open Datasets Yes We begin by considering three examples of large graphs of (more or less) generic structure, derived from road networks in three states: California, Pennsylvania, and Texas. Data on these road networks are freely available at https://snap.stanford.edu.
Dataset Splits No The paper describes generating synthetic signals and adding noise to create data for experiments, and then evaluating performance over multiple 'draws of data y' and 'repetitions of the procedure for constructing the signal θ0'. This is a generative and evaluation strategy rather than a traditional train/test/validation split of a pre-existing dataset. For example: "For each data instance y, the DFS fused lasso and Laplacian smoothing estimators... each tuned over 20 values of its own tuning parameter. Then, the value of the tuning parameter minimizing the average MSE, over 50 draws of data y around θ0, was selected for each method. Finally, this optimized MSE, averaged over the 50 draws of data y, and further, over 10 repetitions of the procedure for constructing the signal θ0 explained above, was recorded."
Hardware Specification Yes The computations and timings were performed on a standard laptop computer (with a 2.80GHz Intel Core i7-2640M processor). For the 2d grid graph experiments: The computations and timings were carried out on a standard desktop computer (with a 3.40GHz Intel Core i7-4770 processor).
Software Dependencies No The paper mentions several software packages like R package igraph, glmgen, Matrix, and Matlab package proxTV, Math BGL, but does not specify their version numbers. For example: "...using the dfs function from the R package igraph..." and "...using the trendfilter function from the R package glmgen..."
Experiment Setup Yes We used the following procedure to construct a synthetic signal θ0 Rn on each of the road network graphs, of piecewise constant nature: ... In our experiments, we considered 20 values of the total variation for the underlying signal. For each, the signal θ0 was scaled appropriately to achieve the given total variation value, and data y Rn was generated by adding i.i.d. N(0, 0.22) noise to the components of θ0. For each data instance y, the DFS fused lasso and Laplacian smoothing estimators... each tuned over 20 values of its own tuning parameter. The tuning parameter for each method displayed in the figure was chosen to minimize the average MSE over 100 draws of the data y from the specified model.