TASS-2018: Workshop on Semantic Analysis at SEPLN

About

The workshop and shared task "Sentiment Analysis at SEPLN (TASS)" has been held since 2012, under the umbrella of the International Conference of the Spanish Society for Natural Language Processing (SEPLN). TASS was the first shared task on sentiment analysis in Twitter in Spanish. Spanish is the second language used in Facebook and Twitter [1], which calls for the development and availability of language-specific methods and resources for sentiment analysis. The initial aim of TASS was the furtherance of research on sentiment analysis in Spanish with a special interest on the language used in Twitter.

Although sentiment analysis is still an open problem, the Organization Committee would like to foster research on other tasks related to the processing of the semantics of texts written in Spanish. Consequently, the name of the workshop/shared task has been changed to "Workshop on Semantic Analysis at SEPLN (TASS)".

The Organization Committee appeals to the research community to propose and organize evaluation tasks related to other semantic tasks in the Spanish language. New tasks provide an opportunity to create linguistic resources, evaluate their usefulness, and promotes the consolidation of a community of researchers interested in the addressed topics. Thus, we encourage the semantic processing community to propose and submit an evaluation tasks (see Proposal of Tasks).

As in previous editions, TASS-2018 proposes two evaluation tasks related to polarity classification at tweet level (task-1) and at aspect level (task-2). Nevertheless, the edition of 2018 brings to the community several novelties:

  1. A new version of InterTASS corpus, which includes tweets written in Spanish spoken in Spain and in other countries of America (see InterTass v1, InterTASS v2)
  2. An evaluation task concerned on the classification of semantic relations in the health domain (see task-3).
  3. An evalution task focused on the classifation of news regarding whether they are good or not according to the reader (see task-4).

TASS-2018 will be the 7th event of the series and will be held in conjunction with the 34rd International Conference of the Spanish Society for Natural Language Processing (SEPLN), in Sevilla, Spain, on September 18th, 2018.

A Google Group has been set up for this year’s TASS Shared Task where announcements will be made. Do send your questions and feedback to (tass-tasks@googlegroups.com).

Proposal of Tasks

Semantic analysis has given rise to new tasks that attempt to further improve natural language understanding systems. In the context of sentiment analysis, some such tasks are cross- and multi-domain sentiment analysis, as well as aspect-based sentiment analysis. Outside the sentiment analysis arena, other tasks attracting the interest of the research community are stance classification, negation handling, rumour identification, fake news identification, open information extraction, argumentation mining, classification of semantic relations, and question answering of non-factoid questions, to name a few. We encourage the research community to propose evaluation tasks related to such semantic analysis processes in Spanish. The above list is by no means closed, so feel free to submit any evaluation task proposal that you consider interesting for the research community.

Proposals must include the following:

  • Title of the task
  • Description of the evaluation task
  • Linguistic resources available or resources to be created
  • Important dates
  • Organization committee
  • Contact person (name and email)

The proposals must be sent to tass-sepln@googlegroups.com by January 16th, 2018. Notification of acceptance January 30th, 2018.

Tasks

Afther the evaluation of the task proposals, the Organizing Committee has accepted the organization of two additional tasks. Therefore, TASS-2018 proposes four tasks in the context of semantic analysis, specifically two of them are related to sentiment analysis (task-1 and task-2), the third one is concerned on the sequential semantic tagging of health documents (task-3), and the fourth one focused on the emotions arousen from the reading of objective news (task-4).

Task 1: Sentiment Analysis at Tweet level

This task focuses on the evaluation of polarity classification systems of tweets written in Spanish. The submitted systems will have to face up with the following challenges:

  1. Lack of context: Remember, tweets are short (up to 240 characters).
  2. Informal language: Misspellings, emojis, onomatopeias are common.
  3. (Local) multilinguality: The training, tests and development corpus contains tweets written in the Spanish language spoken in Spain, Peru and Costa Rica.
  4. Generalization: The systems will be assessed with several corpora, one is the test set of the training data, so it follows a similar distribution; the second corpus is the test set of the General Corpus of TASS (see previous editions), which was compiled some years ago, so it may be lexical and semantic different from the training data. Furthermore, the system will be evaluated with test sets of tweets written in the Spanish language spoken in different American countries.

The participants will be provided with a training, a development and several test corpora (see important dates). All the corpora are annotated with 4 different levels of opinion intensity (P, N, NEU, NONE).

In case the participants submit a supervised or semi-supervised system, it must be only trained with provided training data and it is totally forbidden the use of other training set. However, linguistic resources like lexicons, vectors of word embeddings or knowledge bases can be used. We want a fair competition and furtherance the creativity, so we want to assess the originality of the systems given the same set of training data.

You can read all the details about it in the following link: http://tass.sepln.org/2018/task-1/

Task 2: Aspect-based Sentiment Analysis

The second task proposes the development of aspect-based polarity classification systems. In the datasets, the sequence of tokens that formed the aspect are annotated, as well as the category of the aspect and the polarity of the opinion about the aspect. The opinion is annotated at three level of intensity: P, NEU and N.

In case the participants submit a supervised or semi-supervised system, it must be only trained with provided training data and it is totally forbidden the use of other training set. However, linguistic resources like lexicons, vectors of word embeddings or knowledge bases can be used. We want a fair competition and furtherance the creativity, so we want to assess the originality of the systems given the same set of training data.

You can read all the details about it in the following link: http://tass.sepln.org/2018/task-2/

Task 3: eHealth Knowledge Discovery

It encourages the development of semantic sequential tagging systems in the domain of health documents. The University of Alicante (Spain) and the University of Habana (Cuba) propose this interesting and challenging task, and you can read all the details about it in the following link: http://tass.sepln.org/2018/task-3/

Task 4: Good Or Bad News? Emotional categorization of news articles

When you read news about natural disasters, you usually feel negative emotions, and when you read news about the last championship won by your favourite football team, you usually feel positive emotions. If you want to promote your brand, you desire that the ads of your brand will be close to news that arouse positive emotions. Therefore, the identification of the emotions that can arouse news are very important for the reputations of brands. Task-4 encourages the development of systems that can classify wheter a news is positive (it can arouse positive emotions) or negative (it can arouse negative emotions). MeaningCloud and the Universidad de Granada proposes this interesting task, and you can read all the details about it in the following link: http://tass.sepln.org/2018/task-4/

Shared Task

Evaluation

The evaluation of task-1, task-2, task-3 and task-4 are published in their webpages.

Proceedings

The Organization Committee of TASS encourages participants to submit a description paper of their systems. Submitted papers will be reviewed by a scientific committee, and only accepted papers will be published at CEUR, as in previous years (2015, 2016 and 2017).

The manuscripts must satisfy the following rules:

  • Up to 6 pages plus references formatted according to the SEPLN template.
  • Articles can be written in English or Spanish. The title, abstract and keywords must be written in both languages.
  • The document format must be Word or Latex, but the submission must be in PDF format.
  • Instead of describing the task and/or the corpus, you should focus on the description of your experiments and the analysis of your results, and include a citation to the Overview paper.

The proceedings will be structured in sections, one per each task of TASS-2018.

Depending on the final number of participants and the time allocated for the workshop, all or a selected group of papers will be presented and discussed in the Workshop session.

The proceedings are published in CEUR and you can read them here.

Program

TASS 2018 is going to be celebrated the 18th of September (Tuesday) in Seville (Spain). TASS is going to start at 14:30 and finish about 20:00.
You can read the official program of the Conference

The program of TASS is the following:

14:30-15:10 Welcome / Opening Remarks / Overview task 1 & 2
15:10-15:20 Atalaya at TASS 2018: Sentiment Analysis with Tweet Embeddings and Data Augmentation
Franco M. Luque, Juan Manuel Pérez
15:20-15:30 ELiRF-UPV en TASS 2018: Análisis de Sentimientos en Twitter basado en Aprendizaje Profundo (ELiRF-UPV at TASS 2018: Sentiment Analysis in Twitter based on Deep Learning)
José-Ángel González, Lluís-F. Hurtado, Ferran Pla
15:30-15:40 INGEOTEC solution for Task 1 in TASS'18 competition
Daniela Moctezuma, José Ortiz-Bejar, Eric S. Tellez, Sabino Miranda-Jiménez, Mario Graff
15:40-15:50 Aplicación de un modelo híbrido de aprendizaje profundo para el Análisis de Sentimiento en Twitter (Application of a hybrid deep learning model for Sentiment Analysis in Twitter)
Rosa Montañés, Rocío Aznar, Rafael del Hoyo
15:50-16:00 RETUYT-InCo at TASS 2018: Sentiment Analysis in Spanish Variants using Neural Networks and SVM
Luis Chiruzzo, Aiala Rosá
16:00-16:20 Task 3: Overview
16:20-16:30 A Hybrid Bi-LSTM-CRF model for Knowledge Recognition from eHealth documents
Renzo M. Rivera Zavala, Paloma Martínez, Isabel Segura-Bedmar
16:30-16:40 SINAI en TASS 2018 Task 3. Clasificando acciones y conceptos con UMLS en MedLine (SINAI in TASS 2018 Task 3. Classifying actions and concepts with UMLS on MedLine)
Pilar López-Úbeda, Manuel C. Díaz-Galiano, María Teresa Martín-Valdivia, L. Alfonso Ureña-López
16:40-16:50 TASS2018: Medical knowledge discovery by combining terminology extraction techniques with machine learning classification
Jorge Vivaldi Palatresi, Horacio Rodríguez Hontoria
16:50-17:00 Clasificación conjunta de frases clave y sus relaciones en documentos electrónicos de salud en español (Joint classification of Key-Phrases and Relations in Electronic Health Documents)
Salvador Medina, Jordi Turmo
Coffee-break
17:30-17:40 LABDA at TASS-2018 Task 3: Convolutional Neural Networks for Relation Classification in Spanish eHealth documents
Víctor Suárez-Paniagua, Isabel Segura-Bedmar, Paloma Martínez
17:40-18:00 Task 4: overview
18:00-18:10 ELiRF-UPV en TASS 2018: Categorización Emocional de Noticias (ELiRF-UPV at TASS 2018: Emotional Categorization of News Articles)
José-Ángel González, Ferran Pla, Lluís-F. Hurtado
18:10-18:20 INGEOTEC solution for Task 4 in TASS’18 competition
Daniela Moctezuma, José Ortiz-Bejar, Eric S. Tellez, Sabino Miranda-Jiménez, Mario Graff
18:20-18:30 SCI2S at TASS 2018: Emotion Classification with Recurrent Neural Networks
Nuria Rodríguez Barroso, Eugenio Martínez-Cámara, Francisco Herrera
18:30-18:40 SINAI en TASS 2018: Inserción de Conocimiento Emocional Externo a un Clasificador Lineal de Emociones (SINAI at TASS 2018: Lineal Classification System with Emotional External Knowledge)
Flor Miriam-Plaza-del-Arco, Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López
18:40-20:00 Discussion

Presentation instructions

  1. All the papers are going to be orally presented.
  2. The language of the presentation can be Spanish or English.
  3. The duration of the presentation is 7 minutes. This is a strong requirement.
  4. There will be some time for questions after each presentation.

Important dates

The dates of the tasks are in their webpages: task-1, task-2, task-3 and task-4).

Organizing Committee

Program Committee

  • Erik Cambria Nanyang Technological University
  • Edgar Casasola Murillo University of Costa Rica, Costa Rica
  • Fermín Cruz Mata University of Sevilla, Spain
  • Luis Espinosa Anke Cardiff University, United Kingdom
  • Yoan Gutiérrez Vázquez University of Alicante, Spain
  • Lluís F. Hurtado Polytechnic University of Valencia, Spain
  • Salud María Jiménez Zafra University of Jaén, Spain
  • María Victoria Luzón García University of Granada, Spain
  • Mª. Teresa Martín Valdivia University of Jaén, Spain
  • Manuel Montes Gómez National Institute of Astrophysics, Optics and Electronics, Mexico
  • Antonio Moreno Ortíz University of Málaga, Spain
  • José Manuel Perea Ortega University of Extremadura, Spain
  • Ferrán Pla Universidad Politécnica de Valencia, Spain
  • Sara Rosenthal IBM Research, U.S.A.
  • Maite Taboada Simon Fraser University, Canada
  • L. Alfonso Ureña López University of Jaén, Spain

Organized by:

References

  1. Instituto Cervantes. 2017. El español: una lengua viva.