TY - GEN
T1 - K-Clique-Graphs for Dense Subgraph Discovery
AU - Nikolentzos, Giannis
AU - Meladianos, Polykarpos
AU - Stavrakas, Yannis
AU - Vazirgiannis, Michalis
N1 - Publisher Copyright:
© 2017, Springer International Publishing AG.
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Finding dense subgraphs in a graph is a fundamental graph mining task, with applications in several fields. Algorithms for identifying dense subgraphs are used in biology, in finance, in spam detection, etc. Standard formulations of this problem such as the problem of finding the maximum clique of a graph are hard to solve. However, some tractable formulations of the problem have also been proposed, focusing mainly on optimizing some density function, such as the degree density and the triangle density. However, maximization of degree density usually leads to large subgraphs with small density, while maximization of triangle density does not necessarily lead to subgraphs that are close to being cliques. In this paper, we introduce the k-clique-graph densest subgraph problem, K≥3, a novel formulation for the discovery of dense subgraphs. Given an input graph, its k-clique-graph is a new graph created from the input graph where each vertex of the new graph corresponds to a k-clique of the input graph and two vertices are connected with an edge if they share a common K-1-clique. We define a simple density function, the k-clique-graph density, which gives compact and at the same time dense subgraphs, and we project its resulting subgraphs back to the input graph. In this paper, we focus on the triangle-graph densest subgraph problem obtained for K=3. To optimize the proposed function, we provide an exact algorithm. Furthermore, we present an efficient greedy approximation algorithm that scales well to larger graphs. We evaluate the proposed algorithms on real datasets and compare them with other algorithms in terms of the size and the density of the extracted subgraphs. The results verify the ability of the proposed algorithms in finding high-quality subgraphs in terms of size and density. Finally, we apply the proposed method to the important problem of keyword extraction from textual documents. Code related to this chapter is available at: https://github.com/giannisnik/k-clique-graphs-dense-subgraphs.
AB - Finding dense subgraphs in a graph is a fundamental graph mining task, with applications in several fields. Algorithms for identifying dense subgraphs are used in biology, in finance, in spam detection, etc. Standard formulations of this problem such as the problem of finding the maximum clique of a graph are hard to solve. However, some tractable formulations of the problem have also been proposed, focusing mainly on optimizing some density function, such as the degree density and the triangle density. However, maximization of degree density usually leads to large subgraphs with small density, while maximization of triangle density does not necessarily lead to subgraphs that are close to being cliques. In this paper, we introduce the k-clique-graph densest subgraph problem, K≥3, a novel formulation for the discovery of dense subgraphs. Given an input graph, its k-clique-graph is a new graph created from the input graph where each vertex of the new graph corresponds to a k-clique of the input graph and two vertices are connected with an edge if they share a common K-1-clique. We define a simple density function, the k-clique-graph density, which gives compact and at the same time dense subgraphs, and we project its resulting subgraphs back to the input graph. In this paper, we focus on the triangle-graph densest subgraph problem obtained for K=3. To optimize the proposed function, we provide an exact algorithm. Furthermore, we present an efficient greedy approximation algorithm that scales well to larger graphs. We evaluate the proposed algorithms on real datasets and compare them with other algorithms in terms of the size and the density of the extracted subgraphs. The results verify the ability of the proposed algorithms in finding high-quality subgraphs in terms of size and density. Finally, we apply the proposed method to the important problem of keyword extraction from textual documents. Code related to this chapter is available at: https://github.com/giannisnik/k-clique-graphs-dense-subgraphs.
UR - https://www.scopus.com/pages/publications/85040258568
U2 - 10.1007/978-3-319-71249-9_37
DO - 10.1007/978-3-319-71249-9_37
M3 - Conference contribution
AN - SCOPUS:85040258568
SN - 9783319712482
T3 - Lecture Notes in Computer Science
SP - 617
EP - 633
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Proceedings
A2 - Ceci, Michelangelo
A2 - Hollmen, Jaakko
A2 - Todorovski, Ljupco
A2 - Vens, Celine
A2 - Dzeroski, Saso
PB - Springer Verlag
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2017
Y2 - 18 September 2017 through 22 September 2017
ER -