Labeling the Features Not the Samples: Efficient Video Classification with Minimal Supervision
Authors: Marius Leordeanu, Alexandra Radu, Shumeet Baluja, Rahul Sukthankar
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on large-scale recognition in video and show superior speed and performance to established feature selection approaches such as Ada Boost, Lasso, greedy forward-backward selection, and powerful classifiers such as SVM. We evaluate our method s ability to generalize and learn quickly from limited data, in both the supervised and the unsupervised cases. We test on the large-scale You Tube-Objects video dataset (Prest et al. 2012). Results We evaluated eight methods: ours, SVM on all input features, Lasso, Elastic Net (L1+L2 regularization) (Zou and Hastie 2005), Ada Boost on all input features, ours with SVM (applying SVM only to features selected by our method, idea related to (Nguyen and De la Torre 2010; Weston et al. 2000; Kira and Rendell 1992)), forwardbackward selection (Fo Ba) (Zhang 2009) and simple averaging of all signed features, with values in [0, 1] and flipped as discussed before. |
| Researcher Affiliation | Collaboration | Marius Leordeanu1,2 Alexandra Radu1,2 Shumeet Baluja3 Rahul Sukthankar3 1Institute of Mathematics of the Romanian Academy, Bucharest, Romania 2University Politehnica of Bucharest, Bucharest, Romania 3Google Research, Mountain View, CA, USA |
| Pseudocode | Yes | Algorithm 1 Learning with minimal supervision. Learn feature signs from a small set of labeled samples. Create F with flipped features from unlabeled data. Set M F F. Find w = argmaxw w Mw s.t. j wj = 1 , wj [0, 1/k]. return w |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We test on the large-scale You Tube-Objects video dataset (Prest et al. 2012), with difficult sequences from ten categories (aeroplane, bird, boat, car, cat, cow, dog, horse, motorbike, train) taken in the wild. We generate a large pool of over 6000 different features (see Fig. 4), computed and learned from three different datasets: CIFAR10 (Krizhevsky and Hinton 2009), Image Net (Deng et al. 2009) and a holdout part of the You Tube-Objects training set. |
| Dataset Splits | No | The paper explicitly mentions the training set and test set sizes and that a prescribed training/testing split was used, but it does not specify any validation set split. The question requires information on training/test/validation splits. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware specifications (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions general software like Matlab. |
| Software Dependencies | Yes | For SVM we used the LIBSVM (Chang and Lin 2011) implementation version 3.17, with kernel and parameter C validated separately for each type of experiment. |
| Experiment Setup | Yes | In practice, 10 20 iterations bring us close to the stationary point; nonetheless, for thoroughness, we use 100 iterations in all tests. In all our tests, we present results averaged over 30 randomized trials, for each method. The number of features selected can be set to k. We randomly chose 2500 images per class to create features. We also applied PCA to the resulted HOG, and obtained descriptors of 46 dimensions, before passing them to SVM. For SVM we used the LIBSVM (Chang and Lin 2011) implementation version 3.17, with kernel and parameter C validated separately for each type of experiment. For the Lasso we used the latest Matlab library and validated the L1-regularization parameter λ for each experiment. For the Elastic Net we also validated parameter alpha that combines the L1 and L2 regularizers. (k = 50, 10 frames per shot and 10 random shots for training - Figure 9 caption). |