What is the relationship between words and emotions? How can we extract emotionally charged information from a corpus? What parameters can we use to determine the relationship between emotions and what is said? Can we even establish some kind of structure of emotions, would it be a language-specific structure, and would it be possible to use such a structure in Natural Language Processing (NLP) systems? In order to answer these and other questions relevant to Artificial Intelligence (AI), we have undertaken the task of constructing a taxonomy of emotion-related terms for Spanish. Similar proposals exist for English. One method in vogue is to use a structure already existing in that language and translate it directly into Spanish. However, the English terms related to emotions do not correlate one to one with those of Spanish, because it is not a mirror image: the way in which each language constructs its categories to refer to reality can be very different and varies according to several factors, such as context, time and place. For example, the term gladness can be translated in Spanish as regocijo, agradecimiento or satisfacción. From Spanish to English, we can also mention the word fascination, which has several correspondences, such as thrill or enthrallment. This is according to data consulted with bilingual speakers. Also the information extracted through Large Language Models (LLMs) points in that direction. It is important to note here that in our research we used ChatGPT 4.0, Gemini and Claude. For the term glumness, ChatGPT indicated the translation melancholy; Gemini produced sadness; and Claude proposed despondency. None of these translations is erroneous, but depend on the context and the semantic structure in action, which in our case, is that of Spanish. For this reason we came to the conclusion that we should construct a taxonomy of terms related to emotions that responds to the characteristics of the Spanish language for this time and place.
“Me siento Pura Vida”
Thus, in the case of Costa Rica, it is necessary to take into account that a person from this country can say “me siento pura vida”, a phrase that expresses emotion and that is not immediately understandable in the whole Spanish-speaking world. Although a human could detect the emotion in that phrase, an AI application would not necessarily be able to do so, and in some cases certain expressions could have different or even opposite connotations.
A taxonomy such as the one we propose is very useful in AI, since it allows phrases to be associated with emotions. Imagine a system in which it would be possible to search by emotions, rather than by specific terms. We could also think of applications that identify the degree of emotion in a speech or a newspaper article; moreover, it should be possible to establish the balance of emotions present in a document, something that would be extremely valuable for opinion analysis, a very active area of research with practical applications in marketing, politics, sports and entertainment.
Emotional charge of a text
To build a model that can evaluate the emotional charge of a text, it is necessary to identify the words associated with so-called primary emotions. The first level emotions in our taxonomy are love, joy, anger, sadness and fear. Interestingly, in English a sixth one is included, which is surprise, however in our research it was not necessary to include it. Our taxonomy was constructed based on two sources. The first source corresponds to the work of Shaver (1987) and Parrot (2010), who propose a three-level hierarchical structure. The second source is documentary and consists of the extraction of terms associated with emotions from a representative corpus of Costa Rican Spanish. We quickly realized, as already mentioned, that the semantic structure of Spanish referring to emotions is different from that of English, so we had to propose a specific hierarchy. Thus, in the taxonomy we propose, similar to English, 2nd level emotions are derived from 1st level emotions and 3rd level emotions from 2nd level emotions. However, the terms referring to emotions in Spanish are grouped slightly differently, so that the resulting lexical structure is specific to Spanish. In addition, we postulate a 4th level that results from the interaction between the emotions of the other levels, with different degrees of intensity. For example, the term compassion is associated with love (1st level) and with sadness (also 1st level). Rather than a taxonomy, our classification is a network of interrelated words. This structure is crucial for AI applications, as it helps to reduce ambiguity in emotion interpretation and improves accuracy in identifying emotional nuances. In addition, another important contribution is that the taxonomy construction is associated with phrases and lexical usage frequencies, so we have the occurrences of each term and the contexts in which they occur. This is a data-driven research, which provides scientific evidence of the use of words associated with emotion and from which we infer a structure. One of the results consists of statistics that allow us to determine prototypicality, which we can define in our research as the representativeness of a term with respect to the emotion it evokes. For the construction of the taxonomy we used a dataset of text written in Spanish (http://www.earthlings.io/dowonload_cglu.html) consisting of 284 megabytes of text mostly written in Costa Rican Spanish (it is not possible to guarantee 100% the geographical origin of the documents).
For AI, this linguistic adaptation helps to improve emotional recognition in Spanish text processing applications ranging from chatbots to sentiment analysis systems. In order to achieve this, we employ word embeddings, which is a technique that converts words into mathematical vectors to analyze the proximity of terms and their evolution over time. This is especially useful to reflect how emotional associations with certain terms change depending on the context. In terms of applications, a robust and reliable model, such as the one we propose, which adheres to the specificities of the Spanish language, has an impact on the quality of the software tools that will eventually be implemented.
In addition to improving the analysis and processing of Spanish text, a taxonomy such as ours provides more accurate data on the emotional content expressed in different types of documents (short texts, newspaper articles, opinions in social networks or advertising). For example, a company that has launched a new product may find itself in the situation of deciding whether or not to withdraw it from the market; to this effect it would be valuable to have information on the emotional content of the opinions issued by customers. Hence, our research provides a valuable resource for the development of scientific, governmental and commercial applications related to the improvement of services and the estimation of people’s degree of satisfaction.