reproducibilityindex.ai

Near-Optimal Entrywise Anomaly Detection for Low-Rank Matrices with Sub-Exponential Noise

Authors: Vivek Farias, Andrew A Li, Tianyi Peng

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the empirical performance of the EW algorithm, we ﬁrst consider a synthetic setting where we compare the performances of our EW algorithm with various state-of-the-art approaches. We then measure performance on real-world data from a large retailer. The results are summarized in Table 2 and Figure 1. Figure 2 reports the results. We see similar relative merits as in the synthetic experiments: EW achieves an AUC close to that of an algorithm that knows M and α whereas Stable PCP is consistently worse than EW.
Researcher Affiliation	Academia	Vivek F. Farias 1 Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 2Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213, USA 3Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Pseudocode	Yes	Algorithm 1 Entrywise (EW) Algorithm πEW(γ) Input: XΩ, γ (0, 1] 1: Set ˆ M = nm \|Ω\| SVD(XΩ)r. Here SVD(XΩ)r := arg minrank(M) r M X F, where X is obtained from XΩby setting unobserved entries to 0. 2: Estimate (ˆp A, ˆα) based on a moment matching estimator. 3: Estimate a conﬁdence interval [f L ij, f R ij] for f ij for (i, j) Ω. 4: Let {t EW ij } be an optimal solution to the following optimization problem: PEW : max {0 tij 1,(i,j) Ω} (i,j) Ω tij subject to X (i,j) Ω tijf R ij γ X (i,j) Ω f L ij 5: For every (i, j) Ω, generate Aij Ber(t EW ij ) independently. Output: AΩ
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	No	We collected data, from a retailer, consisting of weekly sales of m = 290 products across n = 2481 stores with p O 0.14 and mean value 2.64. Since there is no groundtruth for anomalies, we viewed the collected data as the underlying matrix M , and then introduced noise and artiﬁcial anomalies.
Dataset Splits	No	The paper does not specify traditional training, validation, and test dataset splits. It describes generating ensembles of matrices for evaluation rather than splitting a fixed dataset.
Hardware Specification	No	The paper mentions computational speed and matrix sizes but does not specify any hardware details like GPU/CPU models, memory, or specific cloud resources used for experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	The parameters were sampled uniformly: r [1, 10], M [1, 10], p O [0.5, 1], p A [0, 0.3] and α [0, 1]. For all algorithms, we tuned Lagrange multipliers corresponding to rank using knowledge of the true rank. In particular, for each sample, p A [0, 0.3], α [0, 1] were uniformly drawn.