Author Profiling and Text Forensics Research
Since 2009 the PAN Lab organizes shared tasks on digital text forensics in general, and in author profiling in particular. Pan Lab is part of CLEF, the European Conference and Evaluation Forum around Information Retrieval. CLEF consists of an independent peer-reviewed conference on a broad range of issues in the field of multilingual and multimodal information access evaluation, and a set of labs and workshops designed to test different aspects of mono and cross-language information retrieval systems. CLEF 2018 will be hosted by the University of Avignon, France, 10-14 September 2018.
MeaningCloud has been sponsoring the award to the best performing team in the author profiling task at CLEF since 2015.
Author profiling is a task that given a document has the aim to infer what are the traits of its author.
In 2017 the task focused on gender and language variety identification in Twitter addressing four languages and several of their varieties: English (Australia, Canada, Great Britain, Ireland, New Zealand, United States), Spanish (Argentina, Chile, Colombia, Mexico, Peru, Spain, Venezuela), Portuguese (Brazil, Portugal), and Arabic (Egypt, Gulf, Levantine, Maghrebi).
Twenty-two were the participating teams from all over the world in 2017 and the best results were obtained by Angelo Basile, Gareth Dwyer, Maria Medvedeva, Josine Rawee, Hessel Haagsma, and Malvina Nissim, from the University of Groningen, The Netherlands.
This year the task will go multimodal and not only textual information in tweets will be taken into account but also images of URLs will be used as information sources in order to infer gender demographics. Three will be the languages that will be addressed: English, Spanish and Arabic [http://pan.webis.de/clef18/pan18-web/author-profiling.html].
Universitat Politècnica de València, Spain
Co-organizer of the author profiling task at PAN
Rangel F., Rosso P., Potthast M., Stein B. (2017). Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. In: Cappellato L., Ferro N., Goeuriot L, Mandl T. (Eds.) CLEF 2017 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1866. [http://ceur-ws.org/Vol-1866/invited_paper_11.pdf]
Potthast M., Rangel F., Tschuggnall M., Stamatatos E., Rosso P., Stein B. (2017). Overview of PAN’17: Author Identification, Author Profiling, and Author Obfuscation. In: 8th Int. Conf. of CLEF on Experimental IR Meets Multilinguality, Multimodality, and Visualization, CLEF 2017,
Springer-Verlag, LNCS(10456), pp. 275–290 [http://www.uni-weimar.de/medien/webis/publications/papers/stein_2017k.pdf]