The first part of OSCAR is on streaming mining methods to deal with vocabulary changes. In text mining, the vocabulary words constitute the feature space. A change in the feature space means that the model has been updated. It is impractical to do such an update whenever a new word appears or a word gets out of use. In OSCAR, we rather want to accumulate information on the usage and sentiment of each word to highlight the long-term interplay between word polarity and document polarity. On this basis, we want to design methods that assess the importance of a word for model adaptation, update the vocabulary by using only words that remain important for some time, and adapt models gradually.
Second, we want to reduce the need for labeled documents. In stream classification, it is available at any time to label the arriving data instances. This assumption is in active learning, where only a few instances are chosen for labeling. Active learning methods assume a fixed feature space. In OSCAR, we want to develop active stream learning methods that learn and adapt polarity models to evolving feature space.
Third, we will work on dealing with different types of change simultaneously. To this purpose, we will use ensembles. We will dedicate some ensemble members to the identification of topic trends, others to changes in the vocabulary and others to temporal changes, including periodical ones. We will investigate ways of coordinating the ensemble members to ensure a smooth adaption of the final ensemble model at any time. The output of OSCAR will be a complete framework, encompassing active ensemble learning methods that deal with different forms of change and learn with limited expert involvement. The framework will also encompass coordinating components that weigh the contribution of individual models to the final one, and regulate the exchange of information between ensemble members and active learners.
We want to test OSCAR on real data, mainly from Twitter: we want to study how vocabulary changes and topics emerge and fade in streams of tweets for specific subject areas, and how they influence the learned model.
Kooperationen im Projekt
Tel.+49 391 6758967