TY - JOUR
T1 - XML Content Warehousing
T2 - Improving Sociological Studies of Mailing Lists and Web Data
AU - Nguyen, Benjamin
AU - Vion, Antoine
AU - Dudouet, François Xavier
AU - Colazzo, Dario
AU - Manolescu, Ioana
AU - Senellart, Pierre
PY - 2011/1/1
Y1 - 2011/1/1
N2 - In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis.
AB - In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis.
KW - Analyse des listes email
KW - Gestion de données sur le Web
KW - Mailing List Analysis
KW - Web Data Management
KW - XML
KW - XML
U2 - 10.1177/0759106311417540
DO - 10.1177/0759106311417540
M3 - Article
AN - SCOPUS:84864785792
SN - 0759-1063
VL - 112
SP - 5
EP - 31
JO - BMS Bulletin of Sociological Methodology/ Bulletin de Methodologie Sociologique
JF - BMS Bulletin of Sociological Methodology/ Bulletin de Methodologie Sociologique
IS - 1
ER -