New 2015 workshop! | Past editions: TASS 2013, TASS 2012

TASS 2014

This is the 3rd evaluation workshop for sentiment analysis focused on Spanish. TASS 2014 will be held as part of the XXX SEPLN Conference in Girona, Spain, on September 16-19th, 2014. We invite you to take part in any of the proposed tasks and attend the workshop.

Tweets sobre "#sepln2014"

Welcome to TASS 2014!

TASS is an experimental evaluation workshop for sentiment analysis and online reputation analysis focused on Spanish language, organized as a satellite event of the annual SEPLN Conference. After two successful editions in 2012 and 2013, TASS 2014 will be held on September 16th, 2014 at Universitat de Girona, Spain.

According to Merriam-Webster dictionary, reputation is the overall quality or character of a given person or organization as seen or judged by people in general, or, in other words, the general recognition by other people of some characteristics or abilities for a given entity. Specifically, in business, reputation comprises the actions of a company and its internal stakeholders along with the perception of consumers about the business. Reputation affects attitudes like satisfaction, commitment and trust, and drives behavior like loyalty and support. In turn, reputation analysis is the process of tracking, investigating and reporting an entity's actions and other entities' opinions about those actions. It covers many factors to calculate the market value of reputation. Reputation analysis has come into wide use as a major factor of competitiveness in the increasingly complex marketplace of personal and business relationships among people and companies.

Currently market research using user surveys is typically performed. However, the rise of social media such as blogs and social networks and the increasing amount of user-generated contents in the form of reviews, recommendations, ratings and any other form of opinion, has led to creation of an emerging trend towards online reputation analysis. This analysis has two technological aspects: sentiment analysis and text classification (or categorization).

First, the so-called sentiment analysis, i.e., the application of natural language processing and text analytics to identify and extract subjective information from texts, which is the first step towards the online reputation analysis, is becoming a promising topic in the field of marketing and customer relationship management, as the social media and its associated word-of-mouth effect is turning out to be the most important source of information for companies and their customers' sentiments towards their brands and products.

Then, automatic text classification is used to guess the topic of the text, among those of a predefined set of categories or classes, so as to be able to assign the reputation level of the company into different facets, axis or points of view of analysis.

Sentiment analysis is a major technological challenge. The task is so hard that even humans often disagree on the sentiment of a given text. The fact that issues that one individual finds acceptable or relevant may not be the same to others, along with multilingual aspects, cultural factors and different contexts make it very hard to classify a text written in a natural language into a positive or negative sentiment. And the shorter the text is, for example, when analyzing Twitter messages or short comments in Facebook, the harder the task becomes.

On the other hand, text classification techniques, although studied for a longer time, still need more research effort to be able to build complex models with many categories with less workload and increase the precision and recall of the results. In addition, these models should work well with short texts and deal with specific text features that are present in social media messages (such as spelling mistakes, abbreviations, SMS language, etc.).

Within this context, the aim of TASS is to provide a forum for discussion and communication where the latest research work and developments in the field of sentiment analysis in social media, specifically focused on Spanish language, can be shown and discussed by scientific and business communities. The main objective is to promote the application of existing state-of-the-art algorithms and techniques and the design of new ones for the implementation of complex systems able to perform a sentiment analysis and text classification on short text opinions extracted from social media messages (specifically Twitter) published by a series of representative personalities.

The challenge task is intended to provide a benchmark forum for comparing the latest approaches in these fields. In addition, with the creation and release of the fully tagged corpus, we aim to provide a benchmark dataset that enables researchers to compare their algorithms and systems.

Tasks

This edition of TASS has two objectives. First of all, we are interested in evaluating the evolution of the different approaches for sentiment analysis and text classification in Spanish during these years. So, two legacy tasks will be repeated again, reusing the same corpus, to compare results. Moreover, we want to foster the research in the analysis of fine-grained polarity, i.e., more specific than the global polarity of the text. So two new tasks are proposed related to polarity detection and analysis at aspect level (aspect-based sentiment analysis), one of the new requirements of the market of natural language processing in these areas.

Thus the following four tasks are proposed this year.

Participants are expected to submit up to 3 results of different experiments for one or several of these tasks, in the appropriate format described below.

(legacy) Task 1: Sentiment Analysis at global level

This task consists on performing an automatic sentiment analysis to determine the global polarity of each message in the test set of the General corpus (see below). This task is a reedition of the task in the previous years. Participants will be provided with the training set of the General corpus so that they may train and validate their models.

There will be two different evaluations: one based on 6 different polarity labels (P+, P, NEU, N, N+, NONE) and another based on just 4 labels (P, N, NEU, NONE).

Participants are expected to submit (up to 3) experiments for the 6-labels evaluation, but are also allowed to submit (up to 3) specific experiments for the 4-labels scenario.

Accuracy (correct tweets according to the gold standard) will be used for ranking the systems. Precision, recall and F1-measure will be used to evaluate each individual category.

Results must be submitted in a plain text file with the following format:

tweetid \t polarity

where polarity can be:

P+, P, NEU, N, N+ and NONE for the 6-labels case
P, NEU, N and NONE for the 4-labels case.

The same test corpus of previous years will be used for the evaluation, to allow for comparison among systems. Obviously, participants are not allowed to use any test data to train their systems. However, to deal with the problem reported last years of the imbalanced distribution of labels between the training and test set, a new selected test subset containing 1000 tweets with a similar distribution to the training corpus will be extracted and used for an alternate evaluation of the performance of systems.

(legacy) Task 2: Topic classification

The challenge is to build a classifier to automatically identify the topic of each message in the test set of the General corpus. Again, a reedition of the same task in previous years. Participants may use the training set of the General corpus to train and validate their models.

The task is a multi-label classification and tweets can have more than one label.

Participants are expected to submit up to 3 experiments, each one in a plain text file with the following format:

tweetid \t topic

A given tweet ID can be repeated in different lines if it is assigned more than one topic.

Microaveraged precision, recall and F1-measure calculated over the full test set will be used to evaluate the systems. Systems will be ranked by F1.

To allow for comparison with previous years, the same test corpus will be used for the evaluation. Again, participants are not allowed to use any test data to train their systems.

(new) Task 3: Aspect detection

The objective is the automatic identification of the different aspects expressed by users, among a predefined list, in their opinions in Twitter about a given topic. A new Social-TV corpus will be used for the training and evaluation of the systems (see description below).

This is a multi-label classification and tweets can have more than one aspect.

Participants are expected to submit up to 3 experiments, each in a plain text file with the following format:

tweetid \t aspect

Again, a given tweet ID can be repeated in different lines if it is assigned more than one aspect. Just the aspect must be returned, not the detected terms or fragment in the text nor its offsets.

Microaveraged precision, recall and F1-measure calculated over the full test set will be used to evaluate the systems. Systems will be ranked by F1.

(new) Task 4: Aspect-based sentiment analysis

Systems in this task must identify the polarity of the aspect that was detected in the previous task. Again. participants will be provided with the Social-TV corpus to train and evaluate their models. This task is equivalent to Task 1 but focused on fine-grained polarity detection.

Participants are expected to submit up to 3 experiments, each in a plain text file with the following format:

tweetid \t aspect \t polarity

Allowed polarity values are P, NEU and N.

Microaveraged Precision, recall and F1-measure will be used to evaluate the systems, considering a unique label combining aspect-polarity. Systems will be ranked by F1.

Corpus

General Corpus

The general corpus contains over 68 000 Twitter messages, written in Spanish by about 150 well-known personalities and celebrities of the world of politics, economy, communication, mass media and culture, between November 2011 and March 2012. Although the context of extraction has a Spain-focused bias, the diverse nationality of the authors, including people from Spain, Mexico, Colombia, Puerto Rico, USA and many other countries, makes the corpus reach a global coverage in the Spanish-speaking world.

Each Twitter message includes its ID (tweetid), the creation date (date) and the user ID (user). Due to restrictions in the Twitter API Terms of Service), it is forbidden to redistribute a corpus that includes text contents or information about users. However, it is valid if those fields are removed and instead IDs (including Tweet IDs and user IDs) are provided. The actual message content can be easily obtained by making queries to the Twitter API using the tweetid.

The general corpus has been divided into two sets: training (about 10%) and test (90%). The training set will be released so that participants may train and validate their models. The test corpus will be provided without any tagging and will be used to evaluate the results provided by the different systems. Obviously, it is not allowed to use the test data from previous years to train the systems.

Each message in both the training and test set is tagged with its global polarity, indicating whether the text expresses a positive, negative or neutral sentiment, or no sentiment at all. A set of 6 labels has been defined: strong positive (P+), positive (P), neutral (NEU), negative (N), strong negative (N+) and one additional no sentiment tag (NONE).

In addition, there is also an indication of the level of agreement or disagreement of the expressed sentiment within the content, with two possible values: AGREEMENT and DISAGREEMENT. This is especially useful to make out whether a neutral sentiment comes from neutral keywords or else the text contains positive and negative sentiments at the same time.

Moreover, the polarity at entity level, i.e., the polarity values related to the entities that are mentioned in the text, is also included for those cases when applicable. These values are similarly tagged with 6 possible values and include the level of agreement as related to each entity.

On the other hand, a selection of a set of topics has been made based on the thematic areas covered by the corpus, such as "política" ("politics"), "fútbol" ("soccer"), "literatura" ("literature") or "entretenimiento" ("entertainment"). Each message in both the training and test set has been assigned to one or several of these topics (most messages are associated to just one topic, due to the short length of the text).

All tagging has been done semiautomatically: a baseline machine learning model is first run and then all tags are manually checked by human experts. In the case of the polarity at entity level, due to the high volume of data to check, this tagging has just been done for the training set.

The following figure shows the information of two sample tweets. The first tweet is only tagged with the global polarity as the text contains no mentions to any entity, but the second one is tagged with both the global polarity of the message and the polarity associated to each of the entities that appear in the text (UPyD and Foro Asturias).

        <tweet>
          <tweetid>0000000000</tweetid>
          <user>usuario0</user>
          <content><![CDATA['Conozco a alguien q es adicto al drama! Ja ja ja te suena d algo!]]></content>
          <date>2011-12-02T02:59:03</date>
          <lang>es</lang>
          <sentiments>
            <polarity><value>P+</value><type>AGREEMENT</type></polarity>
          </sentiments>
          <topics>
            <topic>entretenimiento</topic>
          </topics>
        </tweet>
        <tweet>
          <tweetid>0000000001</tweetid>
          <user>usuario1</user>
          <content><![CDATA['UPyD contará casi seguro con grupo gracias al Foro Asturias.]]></content>
          <date>2011-12-02T00:21:01</date>
          <lang>es</lang>
          <sentiments>
            <polarity><value>P</value><type>AGREEMENT</type></polarity>
            <polarity><entity>UPyD</entity><value>P</value><type>AGREEMENT</type></polarity>
            <polarity><entity>Foro_Asturias</entity><value>P</value><type>AGREEMENT</type></polarity>
          </sentiments>
          <topics>
            <topic>política</topic>
          </topics>
        </tweet>

Social-TV Corpus

This corpus was collected during the 2014 Final of Copa del Rey championship in Spain between Real Madrid and F.C. Barcelona, played on 16 April 2014 at Mestalla Stadium in Valencia.

Over 1 million tweets were collected from 15 minutes before to 15 minutes after the match. After filtering useless information, tweets in other languages than Spanish, a subset of 2 773 was selected.

All tweets have been manually tagged with the aspects of the expressed messages and its sentiment polarity. Tweets may cover more than one aspect.

The list of aspects that have been defined is:

Afición
Arbitro
Autoridades
Entrenador
Equipo-Atlético_de_Madrid
Equipo-Barcelona
Equipo-Real_Madrid
Equipo (any other team)
Jugador-Alexis_Sánchez
Jugador-Alvaro_Arbeloa
Jugador-Andrés_Iniesta
Jugador-Angel_Di_María
Jugador-Asier_Ilarramendi
Jugador-Carles_Puyol
Jugador-Cesc_Fábregas
Jugador-Cristiano_Ronaldo
Jugador-Dani_Alves
Jugador-Dani_Carvajal
Jugador-Fábio_Coentrão
Jugador-Gareth_Bale
Jugador-Iker_Casillas
Jugador-Isco
Jugador-Javier_Mascherano
Jugador-Jesé_Rodríguez
Jugador-José_Manuel_Pinto
Jugador-Karim_Benzema
Jugador-Lionel_Messi
Jugador-Luka_Modric
Jugador-Marc_Bartra
Jugador-Neymar_Jr.
Jugador-Pedro_Rodríguez
Jugador-Pepe
Jugador-Sergio_Busquets
Jugador-Sergio_Ramos
Jugador-Xabi_Alonso
Jugador-Xavi_Hernández
Jugador (any other player)
Partido
Retransmisión

Sentiment polarity has been tagged from the point of view of the person who writes the tweet, using 3 levels: P, NEU and N. No distinction is made in cases when the author does not express any sentiment or when he/she expresses a no-positive no-negative sentiment.

The Social-TV corpus has been randomly divided into two sets: training (1 773 tweets) and test (1 000 tweets), with a similar distribution of both aspects and sentiments. The training set will be released so that participants may train and validate their models. The test corpus will be provided without any tagging and will be used to evaluate the results provided by the different systems.

The following figure shows the information of three sample tweets in the training set.

        <tweet id="456544898791907328"><sentiment aspect="Equipo-Real_Madrid" polarity="P">#HalaMadrid</sentiment> ganamos sin <sentiment aspect="Jugador-Cristiano_Ronaldo" polarity="NEU">Cristiano</sentiment>. .perdéis con <sentiment aspect="Jugador-Lionel_Messi" polarity="N">Messi</sentiment>. Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>! !!!!!</tweet>
        <tweet id="456544898942906369">@nevermind2192 <sentiment aspect="Equipo-Barcelona" polarity="P">Barça</sentiment> por siempre!!</tweet>
        <tweet id="456544898951282688"><sentiment aspect="Partido" polarity="NEU">#FinalCopa</sentiment> Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, campeón de la <sentiment aspect="Partido" polarity="P">copa del rey</sentiment></tweet>

Downloads

general.txt [9.2KB, 2014-05-08]: statistics for General corpus
general-tweets-train-tagged.xml [2.5MB, 2014-06-17]: General corpus training set [password protected]
general-tweets-test.xml [8.1MB, 2014-06-17]: General corpus test set (for tasks 1 and 2) [password protected]
general-tweets-test-tagged.xml [15.5MB, 2014-06-17]: General corpus test set, tagged (goldstandard for tasks 1 and 2, built with pooling) [password protected]
socialtv-tweets-train-tagged.xml [429KB, 2014-06-27]: Social-TV corpus training set [password protected]
socialtv-tweets-test.xml [120.7KB, 2014-06-17]: Social-TV corpus test set (for tasks 3 and 4) [password protected]
socialtv-tweets-test-tagged.xml [239.5KB, 2014-06-27]: Social-TV corpus test set, tagged (goldstandard for tasks 3 and 4, manually tagged) [password protected]
Task 1 QREL: task1-5l.qrel, task1-5l-1k.qrel (new 1k corpus)
task1-3l.qrel, task1-3l-1k.qrel (new 1k corpus)
Task 2 QREL: task2.qrel
Tasks 3 and 4 QREL: task34.qrel
Evaluation script: eval-scripts.tgz

Old subsets (see previous editions of TASS):

general-users-tagged.xml [26.2KB, 2014-05-08]: General corpus user information, manually tagged with political orientation (TASS 2013 task 3): [password protected]
politics-tweets-test-tagged.xml [1MB, 2014-06-17]: Politics corpus test set, manually tagged (TASS 2013 task 4) [password protected]

Important Dates

May 5, 2014	Release of tasks and General corpus.
June 17, 2014	Release of Social-TV corpus.
July 27, 2014	Experiment submissions by participants.
July 31, 2014	Evaluation results.
August 17, 2014	Submission of papers.
September 16, 2014	Workshop.

Corpus download

Please send an email to tass AT daedalus.es filling in the TASS Corpus License agreement with your email, affiliation (institution, company or any kind of organization). You will be given a password to download the files in the password protected area.

All corpora will be made freely available to the community after the workshop.

If you use the corpus in your research (papers, articles, presentations for conferences or educational purposes), please include a citation to one of the following publications:

Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., González-Cristobal, J.C. (2013). TASS - Workshop on Sentiment Analysis at SEPLN. Procesamiento del Lenguaje Natural, 50. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/4657.
Villena-Román, J., García-Morera, J., Lana-Serrano, S., & González-Cristóbal, J.C. (2014). TASS 2013 - A Second Step in Reputation Analysis in Spanish. Procesamiento del Lenguaje Natural, 52. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/4901.
TASS (Taller de Análisis de Sentimientos en la SEPLN) website. http://www.daedalus.es/TASS.

Reports from workshop participants

Along with the submission of experiments, participants were invited to submit a paper to the workshop in order to describe their experiments and discussing the results with the audience in a regular workshop session.

Papers should follow the usual SEPLN template given in the author guidelines page. Reports could be written in Spanish or English and there was no limitation in extension as they will be included in the electronic working notes of the conference.

All reports are included in the Proceedings of the TASS workshop at SEPLN 2014:

Contents
TASS 2014 - Workshop on Sentiment Analysis at SEPLN: Overview. Julio Villena Román, Janine García Morera, César de Pablo Sánchez, Miguel Ángel García Cumbreras, Eugenio Martínez Cámara, Alfonso Ureña López, María Teresa Martín Valdivia.
LyS at TASS 2014: A Prototype for Extracting and Analysing Aspects from Spanish tweets. David Vilares, Yerai Doval, Miguel A. Alonso and Carlos Gómez-Rodríguez.
Experiments on feature replacements for polarity classification of Spanish tweets. José M. Perea-Ortega, Alexandra Balahur.
Análisis de sentimiento sobre textos en Español basado en aproximaciones semánticas con reglas lingüísticas. Roberto Hernández Petlachi, Xiaoou Li.
Participación de SINAI Word2Vec en TASS 2014. A. Montejo-Ráez, M.A. García-Cumbreras, M.C. Díaz-Galiano.
ELiRF-UPV en TASS 2014: Análisis de Sentimientos, Detección de Tópicos y Análisis de Sentimientos de Aspectos en Twitter. Lluís-F. Hurtado, Ferran Pla.
SINAI-ESMA: An unsupervised approach for Sentiment Analysis in Twitter. Salud María Jiménez Zafra, Eugenio Martínez Cámara, M. Teresa Martín Valdivia, L. Alfonso Ureña López.
Looking for Features for Supervised Tweet Polarity Classification. Iñaki San Vicente Roncal, Xabier Saralegi Urizar.

Organization

Organizing Commitee

Julio Villena-Román - Daedalus, Spain
Janine García-Morera - Daedalus, Spain
José Carlos González-Cristóbal - Technical University of Madrid, Spain (GSI-UPM)
L. Alfonso Ureña-López - University of Jaen, Spain (SINAI-UJAEN)
Miguel Ángel García-Cumbreras - University of Jaen, Spain (SINAI-UJAEN)
María-Teresa Martín-Valdivia - University of Jaen, Spain (SINAI-UJAEN)
Eugenio Martínez-Cámara - University of Jaen, Spain (SINAI-UJAEN)

Programme Commitee

Alexandra Balahur - EC-Joint Research Centre, Italy
José Carlos Cortizo - European University of Madrid, Spain
Ana García-Serrano - UNED, Spain
José María Gómez-Hidalgo - Optenet, Spain
Julio Gonzalo-Arroyo - UNED, Spain
Lluís F. Hurtado - Universitat Politècnica de València, Spain
Carlos A. Iglesias-Fernández - Technical University of Madrid, Spain
Zornitsa Kozareva - Information Sciences Institute, USA
Sara Lana-Serrano - Technical University of Madrid, Spain
Paloma Martínez-Fernandez - Carlos III University of Madrid, Spain
Ruslan Mitkov - University of Wolverhampton, U.K.
Andrés Montoyo - University of Alicante, Spain
Rafael Muñoz - University of Alicante, Spain
Constantin Orasan - University of Wolverhampton, U.K.
Ferran Pla Santamaria - Universitat Politècnica de València, Spain
Paolo Rosso - Technical University of Valencia, Spain
Mike Thelwall - University of Wolverhampton, U.K.
José Antonio Troyano - University of Seville, Spain
David Vilares - University of Coruña, Spain