Semantic Publishing: a Case Study for the Media Industry

Semantic Publishing at Unidad Editorial: a Client Case Study in the Media Industry 

Last year, the Spanish media group Unidad Editorial deployed a new CMS developed in-house for its integrated newsroom. Unidad Editorial is a subsidiary of the Italian RCS MediaGroup, and publishes some of the newspapers and magazines with highest circulation in Spain, besides owning nation-wide radio stations and a license of DTTV incorporating four TV channels.

Newsroom El Mundo

Newsroom El Mundo

When a journalist adds a piece of news to the system, its content has to be tagged, which constitutes one of the first steps in a workflow that will end with the delivery of this item in different formats, through different channels (print, web, tablet and mobile apps) and for different mastheads. After evaluation of different provider’s solutions in the previous months, the company then decided that semantic tagging would be done through Daedalus’ text analytics technology. Semantic publishing included, in this case, the identification (with disambiguation) of named entities (people, places, organizations, etc.), time and money expressions, concepts, classification according to the IPTC scheme (an international standard for the media industry, with around 1400 classes organized in three levels), sentiment analysis, etc.

Metadata resulting from tagging are used in many different ways: for archiving and searching purposes, to show related content, for personalization, for contextual advertising, for SEO, etc., depending on the final usage or rendering of information.

Integration was made in two steps. First, during the development, integration and testing of the new CMS, semantic tagging was done by calling the services of the APIs in our cloud-based platform MeaningCloud. When everything was running properly, our APIs were installed in a server at our client’s premises in order to ensure independence, moving from a SaaS model to a typical on-site licensing. The only change needed in the client code for migration was a constant with the Internet address of the (in-house) service.

Newsroom El País

Newsroom El País

Media companies represent a mature market for MeaningCloud. We delivered our first text processing solution for newspapers in year 2001. By that time, we started talks with El País to integrate our solution for spell, grammar and style checking in their Newsroom Content Management System (Hermes, from Unisys). Despite the technical difficulties (the solution should work both off and online and should be distributed and installed according to Hermes’ updating procedures), and the computational linguistic work necessary to automate the criteria fixed by the corporate stylebook, the project was completed in a few months. Short time later, our proofreading solution was incorporated into a different platform (Linux based, instead of Windows, with a CMS developed in-house) for the digital edition of the newspaper El Mundo.

Semantic publishing is the target of the last API provided by MeaningCloud. This Application Programming Interface addresses the vertical market of publishing and content industries. Its results include:

  • Unambiguous identification of entities (persons, companies, brands, products…) and of their attributes.
  • Identification of keywords and relevant concepts.
  • Thematic classification based on standard taxonomies (e.g. IPTC for news).
  • Identification of key information: dates, addresses (physical and virtual), money amounts, etc.
  • Content enhancement with related information.
  • Possibility of incorporating customized dictionaries and other linguistic resources.
  • Spelling, grammar and style checking before publishing.

Would you like to try it? Do it for free. Register and enrich your publications, up to half a million words/month.

And, if you plan to attend the Sentiment Analysis Symposium 2014, in New York City, do not forget to pay us a visit at our booth. Good luck!

Jose C. Gonzalez (@jc_gonzalez)

Leave a Reply

Your email address will not be published. Required fields are marked *