TRENDMINER: LARGE SCALE CROSS-LINGUAL TREND MINING AND SUMMARIZATION OF REAL TIME MEDIA STREAMS


The recent massive growth in online media and the rise of user-authored content (e.g., weblogs, Twitter, Facebook) has lead to challenges of how to access and interpret these actively multilingual data, in a timely, efficient, and affordable manner. Scientifically, streaming online media pose new challenges, due to their shorter, noisier, and more colloquial nature. Moreover, they form a temporal stream strongly grounded in events and context. Consequently, existing language technologies fall short on accuracy, scalability, and portability.

This project is co-funded by the EU under FP7 (Seventh Framework Programme) in research objective ICT-2011.4.2 Language Technologies, target outcome b) Information access and mining.

TrendMiner

Objective

The goal of TrendMiner project is to deliver innovative, portable open-source real-time methods for cross-lingual mining and summarization of large-scale stream media.

TrendMiner will achieve this through an interdisciplinary approach, combining broad linguistic methods from text processing, knowledge-based reasoning from web science, machine learning, economics, and political science. No expensive human annotated data will be required due to our use of time-series data (e.g. financial markets, political polls) as a proxy. A key novelty will be weakly supervised machine learning algorithms for automatic discovery of new trends and correlations. Scalability and Affordability will be addressed through a cloud-based infrastructure for real-time text mining from stream media.

Results will be validated in two high-profile case studies:

  • Financial decision support (with analysts, traders, regulators, and economists)
  • Political analysis and monitoring (with politicians, economists, and political journalists)

The techniques will be generic with many business applications: business intelligence, customer relations management, community support. The project will also benefit society and ordinary citizens by enabling enhanced access to government data archives, summarization of online health information, and tracking of hot societal issues.

 

Participants: Ontotext (BG),Eurokleis (IT), Internet Memory Research SAS (FR),Sora (AT),DFKI(DE), The University of Sheffield (GB), the University of Southampton (GB), Research Institute for Linguistics from the Hungarian Academy of Sciences (HU), the Institute of Computer Science from the Polish Academy of Science (PL), Universidad Carlos III de Madrid (ES), Sngular (ES)
Funding organization: European Project, FP7
Web site: www.trendminer-project.eu
Period: 2013-2014

 

7th Framework Programme