The analysis of texts becomes ever more fine-tuned thanks to the constant evolution of computational linguistics.
Documents and texts go through a preliminary phase of stemming and tokenization, which is available in 11 languages.
The sections of the text that are relevant to the overall meaning of a specific document are set apart from those that are not.
The relevant textual elements are then ingested by various classifiers. At this point, the system also extracts those notions that cannot be derived directly from the words used to express them (resolving ambiguities in this way too), and that are not related to a specific language (indeed, the same concepts will be extracted from a document translated into different languages).
THRON, therefore
- recognizes the context and the concept expressed in a text
- solves disambiguation
- aggregates concepts expressed in different languages with the same notions
- learns and assimilates new concepts that, from now on, it will be able to recognize.
Using this information, the semantic engine consolidates different notions that are related to the same conceptual sphere, identifying a topic. Subsequently, it automatically assigns relevant tags to each document, while taking into account your company’s dictionary tags.