DATA MINING TO SUPPORT INTENSIONAL ANSWERING OF BIG DATA
Speaker: Elisa Quintarelli & Mirjana Mazunar
Affiliation: Politecnico di Milano
Date: 19.02.2013 (12:30 – 13:30)
Location: EIT ICT Labs, Via Sommarive 18 - Northern Building - 1st floor, Povo-Trento.
In today’s digital era, data is easy to gather and the heterogeneity and dimensions of datasets are exploding as time passes by. Gartner reports that worldwide information volume is growing at a minimum rate of 59% annually. These datasets are expressed in different formats, for instance relational, XML, and RDF, which may cause even more difficulties to the non-expert users trying to access them without having some a-priori knowledge of their content and structure. Indeed, the processes of query composition - especially in the absence of a schema -and the interpretation of the obtained answers may be non-trivial since the dataset returned as answer may be too big to be easily human-readable. Data mining techniques, already widely applied to extract frequent correlations of values from both structured and semi-structured datasets, provide several interesting solutions for knowledge elicitation. However the mining process is normally guided by the designer, whose knowledge of the application scenario determines the portion of data wherefrom the useful patterns can be extracted. In this talk we describe an approach where the mining process is automatic, and the system extracts approximate, synthetic (intensional) information on both the structure and the contents of the datasets in the form of association rules. This synthetic information is stored and later used to provide: (i) an essential idea – the gist – of both the structure and the content of the original (structured or semi-structured) dataset and (ii) quick, approximate answers to user queries. A prototype system and experimental results will be illustrated to demonstrate the effectiveness of the approach.
Elisa Quintarelli received her Ph.D.in Computer and Automation Engineering at Politecnico di Milano and is currently an assistant professor at the Dipartimento di Elettronica e Informazione, Politecnico di Milano. Her main research interests concern the study of efficient and flexible techniques for specifying and querying semi-structured and temporal data, the application of data mining techniques to support advanced database functionalities, and personalization and context-awareness in data management.
Mirjana Mazuran received her PhD in Computer Science Engineering from Politecnico di Milano in 2012, and is currently with Politecnico di Milano as a Post-Doc. Her main topics of interest are the application of data mining techniques to support advanced database functionalities and semantic techniques for supporting cyber-physical systems. On the first topic she is currently working on mining and querying tree-based patterns from XML documents and on mining violations to relax relational database constraints.
Big Data Seminar Series are a set of multi-disciplinary seminars around the subject of Big Data. The series of seminars are organised twice a month by the Big Data Group of Trento RISE in collaboration with EIT ICT Labs.
You can subscribe to the mailing list here: http://disi.unitn.it/mailman/listinfo/bigdataseminars
Contact: Sandro Battisti
Following the decision taken by the Managemet Committee in February, Big Data aspects will be addressed in a number of Action Lines within the EIT ICT LABS, like Digital Cities, Health and Well Being, Smart Energy.