Storage Fit Learning with Feature Evolvable Streams

Authors: Bo-Jian Hou, Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou7729-7736

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments in different scenarios to validate the three claims presented in Introduction.
Researcher Affiliation Academia Bo-Jian Hou, Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {houbj, yanyh, zhaop, zhouzh}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1 Initialize Algorithm 2 SF2EL
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets Yes We conduct our experiments on 7 datasets from different domains including economy and biology, etc.1 Note that in FESL 30 datasets are used. However, over 20 of them are the texting datasets which do not satisfy the manifold characteristic. The datasets used in our paper all satisfy the manifold characteristic and the Swiss dataset (like a swiss roll) is the perfect one. Swiss is a synthetic dataset containing 2000 samples and is generated by two twisted spiral datasets. As Swiss has only two dimensions, it is convenient for us to observe its manifold characteristic. Other datasets used in our paper also have such property but as a matter of high dimension, we only use Swiss as an example. To generate synthetic data of feature space S2, we artificially map the original datasets by random matrices. Then we have data both from feature space S1 and S2. Since the original data are in batch mode, we manually make them come sequentially. In this way, synthetic data are completely generated. As for the real dataset, we use RFID dataset provided by FESL which satisfies all the assumptions in Preliminary. HTRU 2 and magic04 are two large-scale datasets which contain 17898 and 19020 instances respectively and we only provide their accuracy results in Table 1 due to page limitation. Other results on these two datasets can be found in the supplementary file.
Dataset Splits No The paper does not specify distinct training, validation, and test splits with explicit percentages or counts. It describes an online learning setting where data streams are processed sequentially, and evaluation is performed on rounds T1+1 to T1+T2.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions).
Experiment Setup Yes The probability of labeled data pl is set as 0.3. We also conduct experiments on other different pl and our SF2EL also works well. The performances of all approaches are obtained by average results over 10 independent runs. ... τt = 1/t ... η = pln 2/T2