OSCAR: Ensemble-Methoden und Methoden des aktiven Lernens für die Klassifikation von Meinungsströmen
Projektleiter:
Projekthomepage:
Finanzierung:
With the rise of WEB 2.0, many people use social media to post opinions on almost any subject - events, products, topics. Opinion mining is used to draw conclusions on the attitude of people towards each subject; Such insights are essential for product design and advertisement, for event planning, political campaigns etc. As opinions accumulate, however, changes occur and invalidate the models from which these conclusions are drawn. Changes concern the general sentiment towards and subject to specific facets of this subject, as well as the words used to express sentiment. Subjects so change over time. In OSCAR, we seek to develop our opinion.
The first part of OSCAR is on streaming mining methods to deal with vocabulary changes. In text mining, the vocabulary words constitute the feature space. A change in the feature space means that the model has been updated. It is impractical to do such an update whenever a new word appears or a word gets out of use. In OSCAR, we rather want to accumulate information on the usage and sentiment of each word to highlight the long-term interplay between word polarity and document polarity. On this basis, we design methods that assess the importance of a word for model adaptation, update the vocabulary by using only words that remain important for some time, and adapt models gradually.
Second, we reduce the need for labeled documents. In stream classification, it is available at any time to label the arriving data instances. This assumption is in active learning, where only a few instances are chosen for labeling. Active learning methods assume a fixed feature space. In OSCAR, we want to develop active stream learning methods that learn and adapt polarity models to evolving feature space.
Third, we work on dealing with different types of change simultaneously. To this purpose, we use ensembles. We dedicate some ensemble members to the identification of topic trends, others to changes in the vocabulary and others to temporal changes, including periodical ones. We investigate ways of coordinating the ensemble members to ensure a smooth adaption of the final ensemble model at any time. The output of OSCAR will be a complete framework, encompassing active ensemble learning methods that deal with different forms of change and learn with limited expert involvement. The framework will also encompass coordinating components that weigh the contribution of individual models to the final one, and regulate the exchange of information between ensemble members and active learners.
We test OSCAR on real data, mainly from Twitter: we study how vocabulary changes and topics emerge and fade in streams of tweets for specific subject areas, and how they influence the learned model.
The first part of OSCAR is on streaming mining methods to deal with vocabulary changes. In text mining, the vocabulary words constitute the feature space. A change in the feature space means that the model has been updated. It is impractical to do such an update whenever a new word appears or a word gets out of use. In OSCAR, we rather want to accumulate information on the usage and sentiment of each word to highlight the long-term interplay between word polarity and document polarity. On this basis, we design methods that assess the importance of a word for model adaptation, update the vocabulary by using only words that remain important for some time, and adapt models gradually.
Second, we reduce the need for labeled documents. In stream classification, it is available at any time to label the arriving data instances. This assumption is in active learning, where only a few instances are chosen for labeling. Active learning methods assume a fixed feature space. In OSCAR, we want to develop active stream learning methods that learn and adapt polarity models to evolving feature space.
Third, we work on dealing with different types of change simultaneously. To this purpose, we use ensembles. We dedicate some ensemble members to the identification of topic trends, others to changes in the vocabulary and others to temporal changes, including periodical ones. We investigate ways of coordinating the ensemble members to ensure a smooth adaption of the final ensemble model at any time. The output of OSCAR will be a complete framework, encompassing active ensemble learning methods that deal with different forms of change and learn with limited expert involvement. The framework will also encompass coordinating components that weigh the contribution of individual models to the final one, and regulate the exchange of information between ensemble members and active learners.
We test OSCAR on real data, mainly from Twitter: we study how vocabulary changes and topics emerge and fade in streams of tweets for specific subject areas, and how they influence the learned model.
Schlagworte
Data Mining, Ensemble-Methoden, Evolution der Daten, Evolution des Vokabulars, Klassifikation in Datenströmen, Meinungsströme, Mining auf Meinungsströme, aktives Lernen auf Datenströme
Kooperationen im Projekt
Publikationen
Die Daten werden geladen ...
Die Daten werden geladen ...
Die Daten werden geladen ...
Kontakt
Prof. Myra Spiliopoulou
Otto-von-Guericke-Universität Magdeburg
Institut für Technische und Betriebliche Informationssysteme
Universitätsplatz 2
39106
Magdeburg
Tel.:+49 391 6758967
weitere Projekte
Die Daten werden geladen ...