Abstract
The present paper advances a robust video fingerprinting system for tracking the visual content subjected to live recording. The methodological novelty of the system relies in creating synergies between architectural modules, designed so as to offer: (1) local visual feature representations, invariant with respect to scale, orientation and affine transformations; (2) scalable global feature representations invariant with respect to photometric transformations and (3) time-variant jitter synchronization. The system is tested on a reference database of 14 h of cinematographic content and on a query dataset of 28 h of video related to two use cases: (a) computer-generated distortions (Gaussian filtering, sharpening, rotations with 2°, conversion to grayscale, contrast changes, brightness changes, geometric random bending) and (b) live camera recording. The former use case resulted in ideal rate of false alarm, probability of missed detection of 0.02 and F1 score of 0.97. However, the applicative novelty is given by solving the latter use case: experimental values of the false alarm rate lower than 0.01, probability of missed detection of 0.04 and F1 score equal to 0.94 were obtained for content live recorded from theaters’ and PC screens; these results demonstrate the robustness of the advanced method against live camera recording.
| Original language | English |
|---|---|
| Pages (from-to) | 229-243 |
| Number of pages | 15 |
| Journal | Multimedia Systems |
| Volume | 22 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 1 Mar 2016 |
| Externally published | Yes |
Keywords
- 2D-DWT
- Bag of visual words
- Live camera recording robustness
- Live recording
- Video fingerprinting