This task focuses on the evaluation of polarity classification systems of tweets written in Spanish. The submitted systems will have to face up with the following challenges:
- Lack of context: Remember, tweets are short (up to 240 characters).
- Informal language: Misspellings, emojis, onomatopeias are common.
- (Local) multilinguality: The training, tests and development corpus contains tweets written in Spanish from Spain, Peru and Costa Rica.
- Generalization: The systems will be assessed with several corpora, one is the test set of the training data, so it follows a similar distribution; the second corpus is the test set of the General Corpus of TASS (see previous editions), which was compiled some years ago, so it may be lexical and semantic different from the training data. Furthermore, the system will be evaluated with test sets of tweets written in the Spanish language spoken in different American countries.
The participants will be provided with a training, a development and several test corpora (see important dates).
All the corpora are annotated with 4 different levels of opinion intensity
In case the participants submit a supervised or semi-supervised system, it must be only trained with provided training data and it is totally forbidden the use of other training set. However, linguistic resources like lexicons, vectors of word embeddings or knowledge bases can be used. We want a fair competition and furtherance the creativity, so we want to assess the originality of the systems given the same set of training data.
In 2018, the InterTASS corpus has been expanded by two more subsets, from Costa Rica and Peru, thanks to the collaboration of our colleagues Edgar Casasola, Gabriela Marin, Marco Antonio Sobrevilla, and their respective collaborators. We have three parallel datasets: ES (Spain), PE (Peru) and CR (Costa Rica).
Four subtasks are proposed, working with the datasets of the different countries:
Feel free to select any dataset to train and a different one to test, in order to test the dependency of systems on a language.
Accuracy and the macro-averaged versions of Precision, Recall and F1 will be used as evaluation measures. Systems will be ranked by the Macro-F1 and Accuracy measures.
The submitted systems can used any set of data as training dataset, i.e. the training set of InterTASS, other training sets from the previous editions or other sets of tweets. However, it is forbiden the use of the test set of InterTASS and the test set of the datasets of previous editions as training data.
International TASS Corpus (InterTASS) is a corpus released in 2017. It is composed of tweets written in different varieties of Spanish (Spain, Peru and Costa Rica). As an international language, Spanish exhibits a large number of lexical and even structural differences in its different varieties. Such differences need to be taken into account in the development of language understanding systems in general, and in polarity classification systems in particular. The main purpose of compiling and using an inter-varietal corpus of Spanish for the evaluation tasks is to challenge participating systems to cope with the many faces of this language worldwide.
The sentiemnt of the tweets of the corpus are annotated in a scale of 4 levels of polarity:
NONE. The corpus has three datasets:
- Training: it is composed of 1008 tweets.
- Development: it is composed of 506 tweets.
- Test: it is composed of 1899 tweets.
The three datasets of the corpus are three XML files, and an example of a tweet of InterTASS is the following one:
<tweet> <tweetid>768224728049999872</tweetid> <user>caval100</user> <content>Se ha terminado #Rio2016 Lamentablemente no arriendo las ganancias al pueblo brasileño por la penuria que les espera Suerte y solidaridad</content> <date>2016-08-23 23:13:42</date> <lang>es</lang> <sentiment> <polarity><value>N</value></polarity> </sentiment> </tweet>
This general corpus contains over 68 000 Twitter messages, written in Spanish by about 150 well-known personalities and celebrities of the world of politics, economy, communication, mass media and culture, between November 2011 and March 2012. Although the context of extraction has a Spain-focused bias, the diverse nationality of the authors, including people from Spain, Mexico, Colombia, Puerto Rico, USA and many other countries, makes the corpus reach a global coverage in the Spanish-speaking world.
The general corpus has been divided into two sets: training (about 10%) and test (90%). The training set will be released so that participants may train and validate their models. The test corpus will be provided without any tagging and will be used to evaluate the results provided by the different systems. Obviously, it is not allowed to use the test data from previous years to train the systems.
Each message in both the training and test set is tagged with its global polarity, indicating whether the text expresses a positive, negative or neutral sentiment, or no sentiment at all. A set of 6 labels has been defined: strong positive (
P+), positive (
P), neutral (
NEU), negative (
N), strong negative (
N+) and one additional no sentiment tag (
In addition, there is also an indication of the level of agreement or disagreement of the expressed sentiment within the content, with two possible values:
DISAGREEMENT. This is especially useful to make out whether a neutral sentiment comes from neutral keywords or else the text contains positive and negative sentiments at the same time.
Moreover, the polarity at entity level, i.e., the polarity values related to the entities that are mentioned in the text, is also included for those cases when applicable. These values are similarly tagged with 6 possible values and include the level of agreement as related to each entity.
On the other hand, a selection of a set of topics has been made based on the thematic areas covered by the corpus, such as "política" ("politics"), "fútbol" ("soccer"), "literatura" ("literature") or "entretenimiento" ("entertainment"). Each message in both the training and test set has been assigned to one or several of these topics (most messages are associated to just one topic, due to the short length of the text).
All tagging has been done semiautomatically: a baseline machine learning model is first run and then all tags are manually checked by human experts. In the case of the polarity at entity level, due to the high volume of data to check, this tagging has just been done for the training set.
The following figure shows the information of two sample tweets. The first tweet is only tagged with the global polarity as the text contains no mentions to any entity, but the second one is tagged with both the global polarity of the message and the polarity associated to each of the entities that appear in the text (
<tweet> <tweetid>0000000000</tweetid> <user>usuario0</user> <content><![CDATA['Conozco a alguien q es adicto al drama! Ja ja ja te suena d algo!]]></content> <date>2011-12-02T02:59:03</date> <lang>es</lang> <sentiments> <polarity><value>P+</value><type>AGREEMENT</type></polarity> </sentiments> <topics> <topic>entretenimiento</topic> </topics> </tweet> <tweet> <tweetid>0000000001</tweetid> <user>usuario1</user> <content><![CDATA['UPyD contará casi seguro con grupo gracias al Foro Asturias.]]></content> <date>2011-12-02T00:21:01</date> <lang>es</lang> <sentiments> <polarity><value>P</value><type>AGREEMENT</type></polarity> <polarity><entity>UPyD</entity><value>P</value><type>AGREEMENT</type></polarity> <polarity><entity>Foro_Asturias</entity><value>P</value><type>AGREEMENT</type></polarity> </sentiments> <topics> <topic>política</topic> </topics> </tweet>
The evaluation web page is available and it is at: http://tass.sepln.org/2018/task-1/private/evaluation/evaluate.php
Results must be submitted in a plain text file with the following format:
tweet_id \t polarity
The best system of the first task, using the InterTASS corpus, will receive the best system award that is a cash prize of 100€, which is sponsored by MeaningCloud.
See main webpage of TASS-2018.
To be announced.
To be announced.
Release of training and development corpora
May 2, 2018
Release of test corpora
June 15, 2018
Deadline for evaluation
June 27, 2018
July 4, 2018
July 16, 2018
July 27, 2018
Camera ready submission
September 5, 2018
September 17, 2018
September 18, 2018
Edgar Casasola Murillo University of Costa Rica, Costa Rica
Fermín Cruz Mata University of Sevilla, Spain
Yoan Gutiérrez Vázquez University of Alicante, Spain
Lluís F. Hurtado Polytechnic University of Valencia, Spain
Salud María Jiménez Zafra University of Jaén, Spain
Mª. Teresa Martín Valdivia University of Jaén, Spain
Manuel Montes Gómez National Institute of Astrophysics, Optics and Electronics, Mexico
Antonio Moreno Ortíz University of Málaga, Spain
Preslav Nakov Qatar Computing Research Institute, Qatar
José Manuel Perea Ortega University of Extremadura, Spain
Ferrán Pla Universidad Politécnica de Valencia, Spain
Sara Rosenthal IBM Research, U.S.A.
Maite Taboada Simon Fraser University, Canada
L. Alfonso Ureña López University of Jaén, Spain
- Scott, William A. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3):321–325.
- Cohen, Jacob. 1960. A coefficient of agree-ment for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
- Landis, J. Richard and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.