Sentiment Analysis at Tweet level

About

This task focuses on the evaluation of polarity classification systems of tweets written in Spanish. The submitted systems will have to face up with the following challenges:

  1. Lack of context: Remember, tweets are short (up to 240 characters).
  2. Informal language: Misspellings, emojis, onomatopeias are common.
  3. (Local) multilinguality: The training, tests and development corpus contains tweets written in Spanish from Spain, Peru and Costa Rica.
  4. Generalization: The systems will be assessed with several corpora, one is the test set of the training data, so it follows a similar distribution; the second corpus is the test set of the General Corpus of TASS (see previous editions), which was compiled some years ago, so it may be lexical and semantic different from the training data. Furthermore, the system will be evaluated with test sets of tweets written in the Spanish language spoken in different American countries.

The participants will be provided with a training, a development and several test corpora (see important dates). All the corpora are annotated with 4 different levels of opinion intensity (P, N, NEU, NONE).

In case the participants submit a supervised or semi-supervised system, it must be only trained with provided training data and it is totally forbidden the use of other training set. However, linguistic resources like lexicons, vectors of word embeddings or knowledge bases can be used. We want a fair competition and furtherance the creativity, so we want to assess the originality of the systems given the same set of training data.

Tasks

In 2018, the InterTASS corpus has been expanded by two more subsets, from Costa Rica and Peru, thanks to the collaboration of our colleagues Edgar Casasola, Gabriela Marin, Marco Antonio Sobrevilla, and their respective collaborators. We have three parallel datasets: ES (Spain), PE (Peru) and CR (Costa Rica).

Four subtasks are proposed, working with the datasets of the different countries:

Subtask-1: Monolingual ES

Training and test using the InterTASS ES datasets.

Subtask-2: Monolingual PE

Training and test using the InterTASS PE datasets.

Subtask-3: Monolingual CR

Training and test using the InterTASS CR datasets.

Subtask-4: Cross-lingual

Feel free to select any dataset to train and a different one to test, in order to test the dependency of systems on a language.

Evaluation

Accuracy and the macro-averaged versions of Precision, Recall and F1 will be used as evaluation measures. Systems will be ranked by the Macro-F1 and Accuracy measures.

Datasets

The submitted systems can used any set of data as training dataset, i.e. the training set of InterTASS, other training sets from the previous editions or other sets of tweets. However, it is forbiden the use of the test set of InterTASS and the test set of the datasets of previous editions as training data.

InterTASS corpus

International TASS Corpus (InterTASS) is a corpus released in 2017. It is composed of tweets written in different varieties of Spanish (Spain, Peru and Costa Rica). As an international language, Spanish exhibits a large number of lexical and even structural differences in its different varieties. Such differences need to be taken into account in the development of language understanding systems in general, and in polarity classification systems in particular. The main purpose of compiling and using an inter-varietal corpus of Spanish for the evaluation tasks is to challenge participating systems to cope with the many faces of this language worldwide.

The sentiemnt of the tweets of the corpus are annotated in a scale of 4 levels of polarity: P, NEU, N and NONE. The corpus has three datasets:

  • Training: it is composed of 1008 tweets.
  • Development: it is composed of 506 tweets.
  • Test: it is composed of 1899 tweets.

The three datasets of the corpus are three XML files, and an example of a tweet of InterTASS is the following one:

<tweet>
							<tweetid>768224728049999872</tweetid>
							<user>caval100</user>
							<content>Se ha terminado #Rio2016 Lamentablemente no arriendo las ganancias al pueblo brasileño por la penuria que les espera Suerte y solidaridad</content>
							<date>2016-08-23 23:13:42</date>
							<lang>es</lang>
							<sentiment>
								<polarity><value>N</value></polarity>
							</sentiment>
							</tweet>
						

General corpus

This general corpus contains over 68 000 Twitter messages, written in Spanish by about 150 well-known personalities and celebrities of the world of politics, economy, communication, mass media and culture, between November 2011 and March 2012. Although the context of extraction has a Spain-focused bias, the diverse nationality of the authors, including people from Spain, Mexico, Colombia, Puerto Rico, USA and many other countries, makes the corpus reach a global coverage in the Spanish-speaking world.

The general corpus has been divided into two sets: training (about 10%) and test (90%). The training set will be released so that participants may train and validate their models. The test corpus will be provided without any tagging and will be used to evaluate the results provided by the different systems. Obviously, it is not allowed to use the test data from previous years to train the systems.

Each message in both the training and test set is tagged with its global polarity, indicating whether the text expresses a positive, negative or neutral sentiment, or no sentiment at all. A set of 6 labels has been defined: strong positive (P+), positive (P), neutral (NEU), negative (N), strong negative (N+) and one additional no sentiment tag (NONE).

In addition, there is also an indication of the level of agreement or disagreement of the expressed sentiment within the content, with two possible values: AGREEMENT and DISAGREEMENT. This is especially useful to make out whether a neutral sentiment comes from neutral keywords or else the text contains positive and negative sentiments at the same time.

Moreover, the polarity at entity level, i.e., the polarity values related to the entities that are mentioned in the text, is also included for those cases when applicable. These values are similarly tagged with 6 possible values and include the level of agreement as related to each entity.

On the other hand, a selection of a set of topics has been made based on the thematic areas covered by the corpus, such as "política" ("politics"), "fútbol" ("soccer"), "literatura" ("literature") or "entretenimiento" ("entertainment"). Each message in both the training and test set has been assigned to one or several of these topics (most messages are associated to just one topic, due to the short length of the text).

All tagging has been done semiautomatically: a baseline machine learning model is first run and then all tags are manually checked by human experts. In the case of the polarity at entity level, due to the high volume of data to check, this tagging has just been done for the training set.

The following figure shows the information of two sample tweets. The first tweet is only tagged with the global polarity as the text contains no mentions to any entity, but the second one is tagged with both the global polarity of the message and the polarity associated to each of the entities that appear in the text (UPyD and Foro Asturias).

        <tweet>
          <tweetid>0000000000</tweetid>
          <user>usuario0</user>
          <content><![CDATA['Conozco a alguien q es adicto al drama! Ja ja ja te suena d algo!]]></content>
          <date>2011-12-02T02:59:03</date>
          <lang>es</lang>
          <sentiments>
            <polarity><value>P+</value><type>AGREEMENT</type></polarity>
          </sentiments>
          <topics>
            <topic>entretenimiento</topic>
          </topics>
        </tweet>
        <tweet>
          <tweetid>0000000001</tweetid>
          <user>usuario1</user>
          <content><![CDATA['UPyD contará casi seguro con grupo gracias al Foro Asturias.]]></content>
          <date>2011-12-02T00:21:01</date>
          <lang>es</lang>
          <sentiments>
            <polarity><value>P</value><type>AGREEMENT</type></polarity>
            <polarity><entity>UPyD</entity><value>P</value><type>AGREEMENT</type></polarity>
            <polarity><entity>Foro_Asturias</entity><value>P</value><type>AGREEMENT</type></polarity>
          </sentiments>
          <topics>
            <topic>política</topic>
          </topics>
        </tweet>        
      

Shared Task

Evaluation

The evaluation web page is available and it is at: http://tass.sepln.org/2018/task-1/private/evaluation/evaluate.php

Results must be submitted in a plain text file with the following format:

tweet_id \t polarity

Award

The best system of the first task, using the InterTASS corpus, will receive the best system award that is a cash prize of 100€, which is sponsored by MeaningCloud.

Datasets downloads

The use of InterTASS corpus and General corpus requires of agreeing the terms of use of the data through the signment of the TASS Data License.

Proceedings

See main webpage of TASS-2018.

Program

To be announced.

Presentation instructions

To be announced.

Important dates

Release of training and development corpora

May 2, 2018

Release of test corpora

June 15, 2018

Deadline for evaluation

June 27, 2018 July 4, 2018

Paper submission

July 16, 2018

Review notification

July 27, 2018

Camera ready submission

September 5, 2018

Publication

September 17, 2018

Workshop

September 18, 2018

Organization

Program Committee

  • Edgar Casasola Murillo University of Costa Rica, Costa Rica
  • Fermín Cruz Mata University of Sevilla, Spain
  • Yoan Gutiérrez Vázquez University of Alicante, Spain
  • Lluís F. Hurtado Polytechnic University of Valencia, Spain
  • Salud María Jiménez Zafra University of Jaén, Spain
  • Mª. Teresa Martín Valdivia University of Jaén, Spain
  • Manuel Montes Gómez National Institute of Astrophysics, Optics and Electronics, Mexico
  • Antonio Moreno Ortíz University of Málaga, Spain
  • Preslav Nakov Qatar Computing Research Institute, Qatar
  • José Manuel Perea Ortega University of Extremadura, Spain
  • Ferrán Pla Universidad Politécnica de Valencia, Spain
  • Sara Rosenthal IBM Research, U.S.A.
  • Maite Taboada Simon Fraser University, Canada
  • L. Alfonso Ureña López University of Jaén, Spain

References

  1. Scott, William A. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3):321–325.
  2. Cohen, Jacob. 1960. A coefficient of agree-ment for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
  3. Landis, J. Richard and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.