TY - GEN
T1 - FEDRA
T2 - 9th SIAM International Conference on Data Mining 2009, SDM 2009
AU - Magdalinos, Panagis
AU - Doulkeridis, Christos
AU - Vazirgiannis, Michalis
PY - 2009/12/1
Y1 - 2009/12/1
N2 - Contemporary data-intensive applications generate large datasets of very high dimensionality. Data management in high-dimensional spaces presents problems, such as the degradation of query processing performance, a phenomenon also known as the curse of dimensionality. Dimensionality reduction (DR) tackles this problem, by efficiently embedding data from high dimensional to lower dimensional spaces. However, the large scale and dynamism of generated data calls for methods of low time and space complexity, features that are hardly combined in the majority of existing DR algorithms. Motivated by this fact, in this paper we propose FEDRA, a fast and efficient dimensionality reduction algorithm that uses a set of landmark points to project data to a lower dimensional Euclidean space. FEDRA is both faster and requires less memory than other comparable algorithms, without compromising the projection's quality. We theoretically assess the quality of the resulting projection and provide a bound for the error induced in pairwise distances. Furthermore, we present two extensions of FEDRA that improve the quality of the projection, suitable for applications that can tolerate higher processing costs. We prove the validity of our claims both theoretically and experimentally, by comparing our algorithm against prominent approaches, such as FastMap, LMDS, PCA, SVD and Random Projection.
AB - Contemporary data-intensive applications generate large datasets of very high dimensionality. Data management in high-dimensional spaces presents problems, such as the degradation of query processing performance, a phenomenon also known as the curse of dimensionality. Dimensionality reduction (DR) tackles this problem, by efficiently embedding data from high dimensional to lower dimensional spaces. However, the large scale and dynamism of generated data calls for methods of low time and space complexity, features that are hardly combined in the majority of existing DR algorithms. Motivated by this fact, in this paper we propose FEDRA, a fast and efficient dimensionality reduction algorithm that uses a set of landmark points to project data to a lower dimensional Euclidean space. FEDRA is both faster and requires less memory than other comparable algorithms, without compromising the projection's quality. We theoretically assess the quality of the resulting projection and provide a bound for the error induced in pairwise distances. Furthermore, we present two extensions of FEDRA that improve the quality of the projection, suitable for applications that can tolerate higher processing costs. We prove the validity of our claims both theoretically and experimentally, by comparing our algorithm against prominent approaches, such as FastMap, LMDS, PCA, SVD and Random Projection.
UR - https://www.scopus.com/pages/publications/72849109356
M3 - Conference contribution
AN - SCOPUS:72849109356
SN - 9781615671090
T3 - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
SP - 505
EP - 516
BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Y2 - 30 April 2009 through 2 May 2009
ER -