IMPRINT: Inkrementelles Data Mining für multi-relationale Objekte
Projektleiter:
Projektbearbeiter:
Zaigham Siddiqui,
Max Zimmermann
Finanzierung:
*** IMPRINT DEUTSCH ***
Data Mining Methoden für Datenströme basieren auf der Annahme, dass jede Dateninstanz nur einmal bearbeitet wird. Zum Beispiel liest ein Verfahren, das Netzangriffe- zu erkennen lernt, jede Dateninstanz nur einmal und passt das abgeleitete Modell neuen Arten von Angriffen an. Bei vielen Anwendungen sind die Daten jedoch nicht einfache Dateninstanzen, sondern komplexe, verschachtelte Objekte, deren Bestandteile Ströme von Dateninstanzen sind. Die Information zu einem Kunden besteht zum Beispiel aus Stammdaten, die sich im Laufe der Zeit ändern können, und aus Transaktionen wie Käufe, Retouren oder Produktrezensionen. Wenn ein Unternehmen eine Kundensegmentierung durchführen und diese Segmente aktuell halten will, benötigt es Lernverfahren, die die Modelle aus den Stammdaten und den Transaktionen ableiten und kontinuierlich aktualisieren.
Im Vorhaben IMPRINT unterscheiden wir zwischen permanenten Objekten, die selbst Dateninstanzen beinhalten, und den Dateninstanzen selber; letztere reichern in Form eines Datenstroms die permanenten Objekte über die Zeit an. Die Herausforderungen beim adaptiven Lernen auf permanenten Objekten umfassen die Analyse von Objekten, die durch das Hinzufügen von Dateninstanzen unterschiedlich schnell wachsen, den Vergleich von Objekten unterschiedlicher Größe und Alters- und den Bedarf nach effizienter Hauptspeicherverwaltung. Im Projekt IMPRINT werden wir adaptive Lernverfahren konzipieren, entwickeln und evaluieren, die diesen Anforderungen Genüge tun.
*** IMRPINT ENGLISCH ***
Conventional stream mining methods assume that each data instance is seen only once and is forgotten after being processed. Consider for example a classifier that distinguishes between normal network accesses and attacks. This classifier reads each data instance (access operation) once and must adapt to new types of attack. However, the data to be analyzed in many business applications are not simple instances, but complex, nested objects that contain streams of data instances. Customer data are such an example: they encompass some stationary information, as well as transactions like purchases, service requests, product reviews etc. To learn and maintain customer segments, a company needs learning methods that derive and adapt models upon the complex objects and the streams feeding them.
In IMPRINT we distinguish between perennial objects, which contain data instances, and the stream of data instances themselves. The challenges of mining perennial objects are manifold. They include learning upon objects that grow as new transactions arrive, the comparison of objects that differ in size and age, and their efficient maintenance. In IMPRINT, we will design, develop and evaluate adaptive learning methods that deal with the above challenges.
The published articles thus far are:
Zaigham Faraz Siddiqui, Eleftherios Tiakas, Panagiotis Symeonidis, Myra Spiliopoulou, and Yannis Manolopoulos. Learning Relational User Profiles and Recommending Items as Their Preferences Change. International Journal on Artificial Intelligence Tools, (24)02:31 pages, 2015.
Max Zimmermann and Eirini Ntoutsi and Myra Spiliopoulou. A Semi-supervised Self-Adaptive Classifier over Opinionated Streams, in 'Proceedings of the 2014 IEEE 14th International Conference on Data Mining Workshops (to appear 2014)' , IEEE Computer Society, Washington, DC, USA.
Zaigham Faraz Siddiqui, Eleftherios Tiakas, Panagiotis Symeonidis, Myra Spiliopoulou, and Yannis Manolopoulos. xStreams: Recommending Items to Users with Time-evolving Preferences. 4th International Conference on Web Intelligence, Mining and SemanticsWIMS 14, Thessaloniki, Greece., 2014.
Zaigham Faraz Siddiqui, Georg Krempl, Myra Spiliopoulou, Jose M. Pena, Nuria Paul, and Fernando Maestu. Are Some Brain Injury Patients Improving More Than Others?. The 2014 International Conference on Brain Informatics and Health BIH'14, Warsaw, Poland., 2014.
Max Zimmermann and Eirini Ntoutsi and Myra Spiliopoulou. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM. In Neurocomput. , accepted 4/2014, to appear 2014 , 2014
T. Hielscher, M. Spiliopoulou, H. Völzke, and J.-P. Kühn. Using participant similarity for the classification of epidemiological data on hepatic steatosis. In Proc. of the 27th IEEE Int. Symposium on Computer-Based Medical Systems (CBMS 14), Mount Sinai, NY, 2014. IEEE.
U. Niemann, H. Völzke, Kühn, and M. Spiliopoulou. Learning and inspecting classifica- tion rules from longitudinal epidemiological data to identify predictive features on hepatic steatosis. Journal of Expert Systems with Applications (ESWA), 2013. accepted 02/2014.
M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Adaptive semi supervised opinion clas- sifier with forgetting mechanism. In Proc. of the 29th Annual ACM Symposium on Applied Computing (SAC 14). ACM, 2014.
M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Extracting opinionated (sub)features from a stream of product reviews. In Proceedings of the 16th Int. Conf. on Discovery Science (DS 2013), volume 8140 of Lecture Notes in Computer Science, pages 340 355, Singapore, Oct. 2013. Springer.
S. Glaßer, U. Niemann, B. Preim, and M. Spiliopoulou. Can we Distinguish Between Benign and Malignant Breast Tumors in DCE-MRI by Studying a Tumor s Most Suspect Region Only? In Proc. of the 26th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2013), Porto, Portugal, June 2013.
S. Glaßer; Niemann, Uli; Preim, Uta; Preim, Bernhard; Spiliopoulou, Myra. Classification of benign and malignant DCE-MRI breast tumors by analyzing the most suspect region
P. Matuszyk and M. Spiliopoulou. Framework for storing and processing relational entities in stream mining. In Proc. of the Pacific-Asia Conference on Knowledge Discovery and 16Data Mining (PAKDD 2013), Lecture Notes in Computer Science, pages 497 508, Gold Coast, Australia, April 2013. Springer Berlin Heidelberg.
Z. Siddiqui, M. Oliveira, J. Gama, and M. Spiliopoulou. Where are we going? predicting the evolution of individuals. In Proc. of the IDA 2012 Conference on Intelligent Data Analysis, volume LNCS 7619, pages 357 368, Helsinki, Finland, Oct. 2012. Springer.
J. Gama, M. Spiliopoulou, and G. Krempl. Advanced topics on data stream mining: Ii. min- ing multiple streams. Tutorial at the 23rd Europ. Conf. on Machine Learning and 16th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 12), Sept. 2012.
M. Zimmermann, I. Ntoutsi, Z. Siddiqui, M. Spiliopoulou, and H.-P. Kriegel. Discovering global and local bursts in a stream of news. In Proc. of the SAC 2012 Symposium on Applied Computing, Trento, Italy, March 2012.
Master and bachelor theses and student projects within IMPRINT:
D. Kottke. Budget Optimization for Active Learning in Data Streams. Master thesis, University, Magdeburg 2014.
Y. Xu and M. Hewelt and F. Brög and M. Schlolaut and R. Pleshkanovska. Efficient Unsupervised Discovery of Word Categories. Softwareproject, University, Magdeburg 2014.
T. Böttcher and J. Krüger. Generating a Stream of Re-Appearing Entities and Summarizing Information on these. Bachelor thesis, University, Magdeburg, 2014.
U. Niemann. The Potential of Clustering for Subpopulation Discovery in Epidemiological Datasets. Maste thesis, University, Magdeburg, 2014.
T. Hielscher. Adaptives lernen eines domänenspezifischen lexikons für die Berechnung von Wortpolaritäten. Master thesis, University, Magdeburg, 2014.
A. Kusz. Sentiment-analyse von kundenbewertungen mithilfe von feature-extraktion und zusammenfassung der meinungen zu diesen features. Master thesis, University, Magdeburg, 2013.
U. Niemann and R. Pannicke. Feature-based visual sentiment analysis of text document streams. Teamproject, University, Magdeburg, 2013.
J. Düwel. Dynamische attributräume in der opinon stream klassifikation. Bachelor thesis, University, Magdeburg, 2013.
M. Filax, H. Rothe, J. Polifka, R. Zoun, and S. W. Hart. Job crawler. Teamproject, University, Magdeburg, 2013.
X. Sadovskaya, O. Shamin, and T. Zinke. Learning a domain specific polarity lexicon. Teamproject, University, Magdeburg, 2013.
T. Wu. Implementation of evolutionary model using a mixture of markov chains. Teamproject, Otto-von-Guericke-University Magdeburg, Faculty of Computer Science, Nov. 2013.
M. Tödten. Erkennung von Kombinationnen von Risikofaktoren für fettleber mit Data-Mining-Verfahren. Master thesis, University, Magdeburg, 2012
P. Matuszyk. Framework zur Speicherung und Bearbeitung relationaler Entitäten in einem Datenstrom. Masters thesis, Otto-von-Guericke University of Magdeburg, 2012.
S. Böhlert, A. Kusz, and F. Warschewske. Web crawling of amazon product reviews. Teamproject, University, Magdeburg, 2012.
U. Niemann. Erkennung von verschieden durchbluteten Tumorregionen anhand von dichtebasierten Clustering-Algorithmen in kontrastmittelverstärkten Perfusions-MRT-Aufnahmen der Brust. University, Magdeburg 2012.
M. Tödten. Clustering of Opinionated Documents. Individual-project, University, Magdeburg, 2012.
Data Mining Methoden für Datenströme basieren auf der Annahme, dass jede Dateninstanz nur einmal bearbeitet wird. Zum Beispiel liest ein Verfahren, das Netzangriffe- zu erkennen lernt, jede Dateninstanz nur einmal und passt das abgeleitete Modell neuen Arten von Angriffen an. Bei vielen Anwendungen sind die Daten jedoch nicht einfache Dateninstanzen, sondern komplexe, verschachtelte Objekte, deren Bestandteile Ströme von Dateninstanzen sind. Die Information zu einem Kunden besteht zum Beispiel aus Stammdaten, die sich im Laufe der Zeit ändern können, und aus Transaktionen wie Käufe, Retouren oder Produktrezensionen. Wenn ein Unternehmen eine Kundensegmentierung durchführen und diese Segmente aktuell halten will, benötigt es Lernverfahren, die die Modelle aus den Stammdaten und den Transaktionen ableiten und kontinuierlich aktualisieren.
Im Vorhaben IMPRINT unterscheiden wir zwischen permanenten Objekten, die selbst Dateninstanzen beinhalten, und den Dateninstanzen selber; letztere reichern in Form eines Datenstroms die permanenten Objekte über die Zeit an. Die Herausforderungen beim adaptiven Lernen auf permanenten Objekten umfassen die Analyse von Objekten, die durch das Hinzufügen von Dateninstanzen unterschiedlich schnell wachsen, den Vergleich von Objekten unterschiedlicher Größe und Alters- und den Bedarf nach effizienter Hauptspeicherverwaltung. Im Projekt IMPRINT werden wir adaptive Lernverfahren konzipieren, entwickeln und evaluieren, die diesen Anforderungen Genüge tun.
*** IMRPINT ENGLISCH ***
Conventional stream mining methods assume that each data instance is seen only once and is forgotten after being processed. Consider for example a classifier that distinguishes between normal network accesses and attacks. This classifier reads each data instance (access operation) once and must adapt to new types of attack. However, the data to be analyzed in many business applications are not simple instances, but complex, nested objects that contain streams of data instances. Customer data are such an example: they encompass some stationary information, as well as transactions like purchases, service requests, product reviews etc. To learn and maintain customer segments, a company needs learning methods that derive and adapt models upon the complex objects and the streams feeding them.
In IMPRINT we distinguish between perennial objects, which contain data instances, and the stream of data instances themselves. The challenges of mining perennial objects are manifold. They include learning upon objects that grow as new transactions arrive, the comparison of objects that differ in size and age, and their efficient maintenance. In IMPRINT, we will design, develop and evaluate adaptive learning methods that deal with the above challenges.
The published articles thus far are:
Zaigham Faraz Siddiqui, Eleftherios Tiakas, Panagiotis Symeonidis, Myra Spiliopoulou, and Yannis Manolopoulos. Learning Relational User Profiles and Recommending Items as Their Preferences Change. International Journal on Artificial Intelligence Tools, (24)02:31 pages, 2015.
Max Zimmermann and Eirini Ntoutsi and Myra Spiliopoulou. A Semi-supervised Self-Adaptive Classifier over Opinionated Streams, in 'Proceedings of the 2014 IEEE 14th International Conference on Data Mining Workshops (to appear 2014)' , IEEE Computer Society, Washington, DC, USA.
Zaigham Faraz Siddiqui, Eleftherios Tiakas, Panagiotis Symeonidis, Myra Spiliopoulou, and Yannis Manolopoulos. xStreams: Recommending Items to Users with Time-evolving Preferences. 4th International Conference on Web Intelligence, Mining and SemanticsWIMS 14, Thessaloniki, Greece., 2014.
Zaigham Faraz Siddiqui, Georg Krempl, Myra Spiliopoulou, Jose M. Pena, Nuria Paul, and Fernando Maestu. Are Some Brain Injury Patients Improving More Than Others?. The 2014 International Conference on Brain Informatics and Health BIH'14, Warsaw, Poland., 2014.
Max Zimmermann and Eirini Ntoutsi and Myra Spiliopoulou. Discovering and Monitoring Product Features and the Opinions on them with OPINSTREAM. In Neurocomput. , accepted 4/2014, to appear 2014 , 2014
T. Hielscher, M. Spiliopoulou, H. Völzke, and J.-P. Kühn. Using participant similarity for the classification of epidemiological data on hepatic steatosis. In Proc. of the 27th IEEE Int. Symposium on Computer-Based Medical Systems (CBMS 14), Mount Sinai, NY, 2014. IEEE.
U. Niemann, H. Völzke, Kühn, and M. Spiliopoulou. Learning and inspecting classifica- tion rules from longitudinal epidemiological data to identify predictive features on hepatic steatosis. Journal of Expert Systems with Applications (ESWA), 2013. accepted 02/2014.
M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Adaptive semi supervised opinion clas- sifier with forgetting mechanism. In Proc. of the 29th Annual ACM Symposium on Applied Computing (SAC 14). ACM, 2014.
M. Zimmermann, E. Ntoutsi, and M. Spiliopoulou. Extracting opinionated (sub)features from a stream of product reviews. In Proceedings of the 16th Int. Conf. on Discovery Science (DS 2013), volume 8140 of Lecture Notes in Computer Science, pages 340 355, Singapore, Oct. 2013. Springer.
S. Glaßer, U. Niemann, B. Preim, and M. Spiliopoulou. Can we Distinguish Between Benign and Malignant Breast Tumors in DCE-MRI by Studying a Tumor s Most Suspect Region Only? In Proc. of the 26th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2013), Porto, Portugal, June 2013.
S. Glaßer; Niemann, Uli; Preim, Uta; Preim, Bernhard; Spiliopoulou, Myra. Classification of benign and malignant DCE-MRI breast tumors by analyzing the most suspect region
P. Matuszyk and M. Spiliopoulou. Framework for storing and processing relational entities in stream mining. In Proc. of the Pacific-Asia Conference on Knowledge Discovery and 16Data Mining (PAKDD 2013), Lecture Notes in Computer Science, pages 497 508, Gold Coast, Australia, April 2013. Springer Berlin Heidelberg.
Z. Siddiqui, M. Oliveira, J. Gama, and M. Spiliopoulou. Where are we going? predicting the evolution of individuals. In Proc. of the IDA 2012 Conference on Intelligent Data Analysis, volume LNCS 7619, pages 357 368, Helsinki, Finland, Oct. 2012. Springer.
J. Gama, M. Spiliopoulou, and G. Krempl. Advanced topics on data stream mining: Ii. min- ing multiple streams. Tutorial at the 23rd Europ. Conf. on Machine Learning and 16th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 12), Sept. 2012.
M. Zimmermann, I. Ntoutsi, Z. Siddiqui, M. Spiliopoulou, and H.-P. Kriegel. Discovering global and local bursts in a stream of news. In Proc. of the SAC 2012 Symposium on Applied Computing, Trento, Italy, March 2012.
Master and bachelor theses and student projects within IMPRINT:
D. Kottke. Budget Optimization for Active Learning in Data Streams. Master thesis, University, Magdeburg 2014.
Y. Xu and M. Hewelt and F. Brög and M. Schlolaut and R. Pleshkanovska. Efficient Unsupervised Discovery of Word Categories. Softwareproject, University, Magdeburg 2014.
T. Böttcher and J. Krüger. Generating a Stream of Re-Appearing Entities and Summarizing Information on these. Bachelor thesis, University, Magdeburg, 2014.
U. Niemann. The Potential of Clustering for Subpopulation Discovery in Epidemiological Datasets. Maste thesis, University, Magdeburg, 2014.
T. Hielscher. Adaptives lernen eines domänenspezifischen lexikons für die Berechnung von Wortpolaritäten. Master thesis, University, Magdeburg, 2014.
A. Kusz. Sentiment-analyse von kundenbewertungen mithilfe von feature-extraktion und zusammenfassung der meinungen zu diesen features. Master thesis, University, Magdeburg, 2013.
U. Niemann and R. Pannicke. Feature-based visual sentiment analysis of text document streams. Teamproject, University, Magdeburg, 2013.
J. Düwel. Dynamische attributräume in der opinon stream klassifikation. Bachelor thesis, University, Magdeburg, 2013.
M. Filax, H. Rothe, J. Polifka, R. Zoun, and S. W. Hart. Job crawler. Teamproject, University, Magdeburg, 2013.
X. Sadovskaya, O. Shamin, and T. Zinke. Learning a domain specific polarity lexicon. Teamproject, University, Magdeburg, 2013.
T. Wu. Implementation of evolutionary model using a mixture of markov chains. Teamproject, Otto-von-Guericke-University Magdeburg, Faculty of Computer Science, Nov. 2013.
M. Tödten. Erkennung von Kombinationnen von Risikofaktoren für fettleber mit Data-Mining-Verfahren. Master thesis, University, Magdeburg, 2012
P. Matuszyk. Framework zur Speicherung und Bearbeitung relationaler Entitäten in einem Datenstrom. Masters thesis, Otto-von-Guericke University of Magdeburg, 2012.
S. Böhlert, A. Kusz, and F. Warschewske. Web crawling of amazon product reviews. Teamproject, University, Magdeburg, 2012.
U. Niemann. Erkennung von verschieden durchbluteten Tumorregionen anhand von dichtebasierten Clustering-Algorithmen in kontrastmittelverstärkten Perfusions-MRT-Aufnahmen der Brust. University, Magdeburg 2012.
M. Tödten. Clustering of Opinionated Documents. Individual-project, University, Magdeburg, 2012.
Schlagworte
Classification, Clustering, Constraint-Based Clustering, Data Mining, Stream Mining
Publikationen
Die Daten werden geladen ...
Die Daten werden geladen ...
Die Daten werden geladen ...
Kontakt
Prof. Myra Spiliopoulou
Otto-von-Guericke-Universität Magdeburg
Institut für Technische und Betriebliche Informationssysteme
Universitätsplatz 2
39106
Magdeburg
Tel.:+49 391 6758967
weitere Projekte
Die Daten werden geladen ...