TASS-2018-Task 3. eHealth Knowledge Discovery

Subtask C: Setting semantic relations

Subtask C continues from the output of Subtask B, by linking the entities detected and labelled in each document. Given an input file (i.e. input_<topic>.txt ) and the outputs from both Subtasks A and B, the purpose of this Subtask is to recognize all relevant semantic relationships between the entities recognized. The semantic relationships found in the gold example of the previous sections are illustrated in the following figure and output file:

Example output illustration:

This illustration represents the following gold files:

Example gold output file:

In this subtask six types of relationships are defined:

  1. Relationships between Concepts :
  • is-a : indicating that the first Concept is a subtype, or more concrete expression of the second Concept. For example “asma” is-a “enfermedad”; or “vías respiratorias” is-a “vías”.
  • part-of : indicating that the first Concept is a constituent part or component of the second Concept , such as in “pulmones” part-of “cuerpo humano”.
  • property-of : indicating that the first Concept defines any property or variable characteristic of the second Concept , such as in “avanzada” property-of “ enfermedad”.
  • same-as : for indicating a concept is unambiguously the same as another concept. For example, in “ El Instituto Nacional de Alergias y Enfermedades Infecciosas (NIH) apoya las investigaciones destinadas a desarrollar mejores formas de diagnosticar, tratar y prevenir las infecciones, enfermedades del sistema inmune y las alergias. ”, the Concept “Instituto Nacional de Alergias y Enfermedades Infecciosas” is the same-as “NIH”.

Notice that these relationships are hinted at by some of the words that appear in the sentence, but it is not necessary for these words to be reported for the purpose of the evaluation. For example, the expression “es una” can be a hint to recognize the relation is-a between “asma” and “enfermedad”, but this fragment of text is not identified or labelled.

  1. Relationships between Actions and Concepts or between Actions themselves:
  • subject : identifies the actor that performs the indicated action. For example in “el asma afecta las vías respiratorias”, the Concept “asma” is who performs the action “afecta”. It can be said a subject plays the producer/actor role by establishing this relationship.
  • target : identifies the actor who receives the effect of the indicated action. For example in “el asma afecta las vías respiratorias”, the Concept “vías respiratorias” receives the effect of the action “afecta”. It can be said a target plays the consumer role by establishing this relationship.

Notice that Actions can have both a Subject and a Target . However, sometimes the subject of an action is hidden, or non-existent, such as in the case of actions represented by infinitive verbs, for example in the sentence “Diagnosticar el cáncer es difícil”. In this example the Action “Diagnosticar” only has a target , the Concept “cáncer”, since it is not stated who performs the action.

In other examples, the target can either be missing, or be the same as the subject , such as in the case of actions represented by reflexive verbs, for example in the phrase “...los pulmones se hinchan”. In this example the Action “hinchan” has the same subject and target , the Concept “ pulmones”. This means that the subject and the target refer both to same concept.

As observed in the gold example, an Action can have more than one Subject and/or Target , if the same fragment of text is used to denote multiple occurrences of said action.

Complex concepts can be represented by tuples, e.g < Subject,Action,Target> (or any variant where target or subject can be missing), in which the Action constitutes its core. Therefore, in Subtask C sometimes the subject or target can be another type of Action, which represents a complex concept. For example, in the sentence “ Un ataque de asma se produce cuando los síntomas empeoran ”, a complex concept is “ síntomas empeoran ” where “ síntomas ” is a Concept and “ empeoran ” is an Action ; “ síntomas empeoran ” is the act of symptoms getting worse. Therefore, “ síntomas empeoran ” can be linked to the Action produce ”. The Action produce ” in this sentence is performed by this complex concept, i.e., it is not the symptoms that cause the asthma attack, but rather it is the act of the symptoms getting worse that causes the asthma attack.

The gold output for this Subtask is another file output_C_<topic>.txt with a similar format as before, one entry per line, each entry containing a LABEL , a SOURCE and a DESTINATION .

The label is one of the following linking axioms: is-a , part-of , property-of , subject , target . The SOURCE and DESTINATION are the corresponding ID values for the key phrases that participate in each relationship. They are defined as follows:

  • If the LABEL is one of is-a , part-of , or property-of , then the SOURCE and DESTINATION— two numbers separated by a single whitespace (or tab)—are the source ID and the destination ID of the concepts that participate in the relationship, respectively. For example, in the relationship “asma” is-a “enfermedad”, the source ID is 5, which refers to the key phrase “asma” and the destination ID is 7 which refers to the key phrase “enfermedad” in Subtask A example output.
  • If the LABEL is subject or target , then the SOURCE is the ID of the corresponding Action , while DESTINATION is the ID of the corresponding Concept or Action that performs the given role. For example, in the triplet [“asma”, “afecta”, “vías respiratorias”], the Action ID is 2 which points to the text span “afecta”, the Subject ID is 1 which points to the text span “asma” and the Target ID is 3 which points to the text span “vías respiratorias” in Subtask A example output. Hence, for this action we obtain two different entries, one labelled as subject 2 1 , and the other as target 2 3 .

Development evaluation of Subtask C

An example of an incorrectly linked document could be as follows (dev file: dev/output_C_example.txt):

Running the evaluation script produces the following output (showing only the relevant part for this Subtask):

The script reports the correct, missing and spurious items, defined as follows:

  • Correct : relationships that matched exactly to the gold file, including the LABEL and the corresponding ID s for each of the participants.
  • Missing : relationships that are in the gold file but not in the dev file, either because the LABEL is wrong, or because one of the ID s didn’t match.
  • Spurious : relationships that are in the dev file but not in the gold file, either because the LABEL is wrong, or because one of the ID s didn’t match.

The script also reports standard precision , recall and F1 metrics calculated as follows:

$$ F_1 = 2 \cdot \frac{precision \cdot recall}{precision + recall} $$
$$ precision = \frac{correct}{correct + spurious} $$
$$ recall = \frac{correct}{correct + missing} $$
NOTE: These metrics are only reported for convenience here, to be used by the participants when developing their solutions. The actual score used for ranking participants will be presented later.