reproducibilityindex.ai

Robust Random Cut Forest Based Anomaly Detection on Streams

Authors: Sudipto Guha, Nina Mishra, Gourav Roy, Okke Schrijvers

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we focus on datasets where anomalies are visual, veriﬁable and interpretable. We begin with a synthetic dataset that captures the classic diurnal rhythm of human activity. We then move to a real dataset reﬂecting taxi ridership in New York City. In both cases, we compare the performance of RRCF with IF.
Researcher Affiliation	Collaboration	Sudipto Guha SUDIPTO@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, PA 19104. Nina Mishra NMISHRA@AMAZON.COM Amazon, Palo Alto, CA 94303. Gourav Roy GOURAVR@AMAZON.COM Amazon, Bangalore, India 560055. Okke Schrijvers OKKES@CS.STANFORD.EDU Stanford University, Palo Alto, CA 94305.
Pseudocode	Yes	Algorithm 1 Algorithm Forget Point. ... Algorithm 2 Algorithm Insert Point.
Open Source Code	No	The paper does not contain any statements about releasing open-source code or provide links to a code repository for the methodology described.
Open Datasets	Yes	Next we conduct a streaming experiment using taxi ridership data from the NYC Taxi Commission2. ... 2http://www.nyc.gov/html/tlc/html/about/trip record data.shtml
Dataset Splits	Yes	We learn a threshold for a good score on a training set and report the effectiveness on a held out test set. The training set contains all points before time t and the test set all points after time t.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, scikit-learn versions).
Experiment Setup	Yes	The experiments were run with a shingle of length four, and one hundred trees in the forest, where each tree is constructed with a uniform random reservoir sample of 256 points. ... In the experiments, there were 200 trees in the forest, each computed based on a random sample of 1K points. ... we set our time-decayed sampling parameter to the last two months of ridership.