TASS-2018 Task2 @SEPLN

About

The second task proposes the development of aspect-based polarity classification systems. Two datasets are provided to evaluate the systems: Social-TV and STOMPOL. The two datasets have annotated for aspect, the main category of aspect, and the polarity of the opinion about the aspect. The systems have to classify the opinion about the given aspect in a three-intensity level range of opinion: Positive, Neutral and Negative.

Participants are expected to submit up to 3 experiments for each corpus, each in a plain text file with the following format:

tweetid \t aspect \t polarity

Allowed polarity values are P, NEU and N.

For evaluation, a single label combining "aspect-polarity" will be considered. As in Task 1, the macro-averaged version of Precision, Recall and F1, and Accuracy are the evaluation measures, and Macro-F1 will be used for ranking the systems.

Datasets

Social-TV Corpus

This corpus was collected during the 2014 Copa del Rey final in Spain between Real Madrid and F.C. Barcelona, played on 16 April 2014 at Mestalla Stadium in Valencia. Over 1 million tweets were collected from 15 minutes before to 15 minutes after the match. Irrelevant tweets where filtered out and a subset of 2,773 was selected.

All tweets were manually annotated at aspect level and more than one aspect may be in each tweet. The list of aspects is:

Afición
Árbitro
Autoridades
Entrenador
Teams: Equipo-Atlético_de_Madrid, Equipo-Barcelona, Equipo-Real_Madrid, Equipo (any other team)
Players: Jugador-Alexis_Sánchez, Jugador-Alvaro_Arbeloa, Jugador-Andrés_Iniesta, Jugador-Angel_Di_María, Jugador-Asier_Ilarramendi, Jugador-Carles_Puyol, Jugador-Cesc_Fábregas, Jugador-Cristiano_Ronaldo, Jugador-Dani_Alves, Jugador-Dani_Carvajal, Jugador-Fábio_Coentrão, Jugador-Gareth_Bale, Jugador-Iker_Casillas, Jugador-Isco, Jugador-Javier_Mascherano, Jugador-Jesé_Rodríguez, Jugador-José_Manuel_Pinto, Jugador-Karim_Benzema, Jugador-Lionel_Messi, Jugador-Luka_Modric, Jugador-Marc_Bartra, Jugador-Neymar_Jr., Jugador-Pedro_Rodríguez, Jugador-Pepe, Jugador-Sergio_Busquets, Jugador-Sergio_Ramos, Jugador-Xabi_Alonso, Jugador-Xavi_Hernández, Jugador (any other player)
Partido
Retransmisión

Sentiment polarity was annotated from the point of view of the Twitter user, using 3 tags: P, NEU and N. No distinction is made in cases when the author does not express any sentiment or expresses a no-positive no-negative sentiment.

The Social-TV corpus was randomly divided into two sets: training (1,773 tweets) and test (1,000 tweets), with a similar distribution of both aspects and sentiments. The training set will be released so that participants may train and validate their models. The test corpus will be provided without any annotation and will be used to evaluate the results provided by the different systems.

Three sample tweets from the training set are shown here:

<tweet id="456544898791907328">
<sentiment aspect="Equipo-Real_Madrid" polarity="P">#HalaMadrid</sentiment> ganamos sin <sentiment aspect="Jugador-Cristiano_Ronaldo" polarity="NEU">Cristiano</sentiment>. .perdéis con <sentiment aspect="Jugador-Lionel_Messi" polarity="N">Messi</sentiment>. Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>! !!!!!
</tweet>
<tweet id="456544898942906369">
@nevermind2192 <sentiment aspect="Equipo-Barcelona" polarity="P">Barça</sentiment> por siempre!!
</tweet>
<tweet id="456544898951282688">
<sentiment aspect="Partido" polarity="NEU">#FinalCopa</sentiment> Hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, hala <sentiment aspect="Equipo-Real_Madrid" polarity="P">Madrid</sentiment>, campeón de la <sentiment aspect="Partido" polarity="P">copa del rey</sentiment>
</tweet>

STOMPOL

STOMPOL (corpus of Spanish Tweets for Opinion Mining at aspect level about POLitics) is a corpus of tweets written in Spanish annotated at aspect level. The topic of the tweets is the political campaign of the 2015 regional and local elections in Spain. The tweets were gathered April 23-24, and are related to one of the following political aspects:

Economía (Economy): taxes, infrastructure, markets, labor policy...
Sanidad (Health System): hospitals, public/private health system, drugs, doctors...
Educación (Education): state school, private school, scholarships...
Propio_partido (Political party): anything good (speeches, electoral programme...) or bad (corruption, criticism) related to the entity
Otros_aspectos (Other aspects): electoral system, environmental policy...

Each aspect is related to one or several entities (separated by the pipe symbol |) that correspond to one of the main political parties in Spain:

Partido_Popular (PP)
Partido_Socialista_Obrero_Español (PSOE)
Izquierda_Unida (IU)
Podemos
Ciudadanos (Cs)
Unión_Progreso_y_Democracia (UPyD)

Each tweet in the corpus was manually annotated by two different annotators, plus a third one in case of disagreement, with the sentiment polarity at aspect level. Sentiment polarity was annotated from the point of view of the Twitter user, using 3 levels: P, NEU and N. No difference is made between no sentiment and neutral sentiment (neither positive nor negative).

Each political aspect is linked to its corresponding political party and its polarity.

Some examples are shown in the following figure:

<tweet id="591267548311769088">
@ahorapodemos @Pablo_Iglesias_ @SextaNocheTV Que alguien pregunte si habrá cambios en las <sentiment aspect="Educacion" entity="Podemos" polarity="NEU">becas</sentiment> MEC para universitarios, por favor.
</tweet>

<tweet id="591192167944736769">
#Arroyomolinos lo que le interesa al ciudadano son Políticos cercanos que se interesen y preocupen por sus problemas <sentiment aspect="Propio_partido" entity="Union_Progreso_y_Democracia" polarity="P">@UPyD</sentiment> VECINOS COMO TU
</tweet>

The corpus is made up of 1,284 tweets, and has been divided into training set (784 tweets), which is provided for building and validating the systems, and test set (500 tweets) that will be used for evaluation.

Licence

Downloading any of these datasets requires the signment of the TASS Corpus Licence Agreement, which can be done filling the form that is in this link. After the submission of the form you will receive an email with the link to download the data.

If you use the corpus for your research (papers, articles, presentations for conferences or educational purposes), please cite one of the following publications:

Díaz-Galiano, M.C., Martínez-Cámara, E., García-Cumbreras, M.A., García Vega, M., & Villena-Román, J. (2018). TASS 2017 - The democratization of deep learning in TASS 2017. Procesamiento del Lenguaje Natural, 60.
Martínez-Cámara, E., García-Cumbreras, M.A., Villena-Román, J., & García-Morera, J. (2016). TASS 2015 - The Evolution of the Spanish Opinion Mining Systems. Procesamiento del Lenguaje Natural, 56.
Villena-Román, J., Martínez-Cámara, E., García-Morera, J. & Jiménez-Zafra, S. (2015). TASS 2014 - The Challenge of Aspect-based Sentiment Analysis. Procesamiento del Lenguaje Natural, 54.
Villena-Román, J., García-Morera, J., Lana-Serrano, S., & González-Cristóbal, J.C. (2014). TASS 2013 - A Second Step in Reputation Analysis in Spanish. Procesamiento del Lenguaje Natural, 52.
Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., González-Cristobal, J.C. (2013). TASS - Workshop on Sentiment Analysis at SEPLN. Procesamiento del Lenguaje Natural, 50.

Shared Task

Evaluation

The evaluation web page is available and it is at: http://tass.sepln.org/2018/task-2/private/evaluation/evaluate.php

Results must be submitted in a plain text file with the following format:

tweetid \t aspect \t polarity

Datasets downloads

The use of Social-TV corpus and STOMPOL corpus requires of agreeing the terms of use of the data through the signment of the TASS Data License.

Proceedings

See main webpage of TASS-2018.

Program

To be announced.

Presentation instructions

To be announced.

Important dates

Release of training and development corpora

May 2, 2018

Release of test corpora

June 15, 2018

Deadline for evaluation

~~June 27, 2018~~ July 4, 2018

Paper submission

July 16, 2018

Review notification

July 27, 2018

Camera ready submission

September 5, 2018

Publication

September 17, 2018

Workshop

September 18, 2018

Organizing Committee

Julio Villena-Román MeaningCloud, Spain
Miguel Ángel García Cumbreras University of Jaén, Spain
Eugenio Martínez Cámara Universidad de Granada, Spain
Manuel Carlos Díaz Galiano University of Jaén, Spain
Manuel García Vega University of Jaén, Spain

Program Committee

Edgar Casasola Murillo University of Costa Rica, Costa Rica
Fermín Cruz Mata University of Sevilla, Spain
Yoan Gutiérrez Vázquez University of Alicante, Spain
Lluís F. Hurtado Polytechnic University of Valencia, Spain
Salud María Jiménez Zafra University of Jaén, Spain
Mª. Teresa Martín Valdivia University of Jaén, Spain
Manuel Montes Gómez National Institute of Astrophysics, Optics and Electronics, Mexico
Antonio Moreno Ortíz University of Málaga, Spain
Preslav Nakov Qatar Computing Research Institute, Qatar
José Manuel Perea Ortega University of Extremadura, Spain
Ferrán Pla Universidad Politécnica de Valencia, Spain
Sara Rosenthal IBM Research, U.S.A.
Maite Taboada Simon Fraser University, Canada
L. Alfonso Ureña López University of Jaén, Spain

Organized by:

References

Instituto Cervantes. 2016. El español: una lengua viva.