Pre-release Prediction of Crowd Opinion on Movies by Label Distribution Learning

Authors: Xin Geng, Peng Hou

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that LDSVR can accurately predict peoples s rating distribution about a movie just based on the pre-release metadata of the movie. 4 Experiments
Researcher Affiliation Academia Xin Geng and Peng Hou School of Computer Science and Engineering Southeast University, Nanjing, China {xgeng, hpeng}@seu.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The data set used in the experiments includes 7, 755 movies and 54, 242, 292 ratings from 478, 656 different users. The ratings come from Netflix, which are on a scale from 1 to 5 integral stars. Each movie has, on average, 6, 994 ratings. The rating distribution is calculated for each movie as an indicator for the crowd opinion on that movie. The pre-release metadata are crawled from IMDb according to the unique movie IDs. Table 1 lists all the metadata included in the data set.
Dataset Splits Yes The algorithm parameters used in the experiments are empirically determined. The parameter selection process is nested into the 10-fold cross validation. In detail, the whole data set is first randomly split into 10 chunks. Each time, one chunk is used as the test set, another is used as the validation set, and the rest 8 chunks are used as the training set. Then, the model is trained with different parameter settings on the training set and tested on the validation set. This procedure is repeated 10 folds, and the parameter setting with the best average performance is selected. After that, the original validation set is merged into the training set and the test set remains unchanged. The model is trained with the selected parameter setting on the updated training set and tested on the test set. This procedure is repeated 10 folds and the mean value and standard deviation of each evaluation measure is reported.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions various algorithms and methods (e.g., BFGS-LLD, IIS-LLD, AA-k NN, CPNN, RBF kernel) but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup Yes All kernel based methods (LDSVR, S-SVR and M-SVRp) use the RBF kernel with the scaling factor σ equal to the average distance among the training examples. The penalty parameter C in Eq. (2) is set to 1. The insensitivity parameter ε is set to 0.1. All iterative algorithms terminate their iteration when the difference between adjacent steps is smaller than 10^-10. The number of neighbors k in AA-k NN is set to 10, and the number of hidden neurons in CPNN is set to 80.