The SentiDeepFusion Project

Deep learning approach to automation of multilingual information extraction from colloquial texts for conversational interfaces and customer experience analytics

Project description

Multinational companies run customer service departments in multiple languages in parallel. In Germany, the U.K., and the U.S. alone, $100 billion is spent annually on the salaries of employees in these departments. This creates a lot of room for potential savings by developing tools to analyze and automate conversations held in remote customer service channels, comprehensively and in multiple languages simultaneously.

The SentiDeepFusion project aims to create such tools through research and development of natural language understanding technology. During the project, we created multilingual datasets for machine learning tasks, semantic models based on deep learning, and heuristic methods. We built both classifiers and unsupervised learning models. The resulting solutions support multiple languages simultaneously and have been applied to SentiOne Listen and SentiOne Automate products.

The results of the project are a worldwide innovation. Exemplary applications include the automatic evaluation of customer service employees in England, Germany, and Spain, the detection of criteria for successful conversations, and the collection of data from customer interviews in these countries into a single database. Additionally, the project includes the automatic detection of typical conversation scenarios and the automation of selected parts of them, aimed at relieving agents from answering the most common repetitive questions. The recipients of the project are multinational corporations supplying products and services to the consumer market.


Project POIR.01.01.01-00-0923/20, “SentiDeepFusion: deep learning approach to automation of multilingual information extraction from colloquial texts for conversational interfaces and customer experience analytics”


Start: 1st of January 2021
End: 31st of December 2023
Total project budget: 24 357 995,15 zł (~5,3M €)
Subsidy: 18 836 158,78 zł (~4,1M €)
Smart Growth Operational Programme 2014-2020
Measure: 1.1 R&D projects of enterprises.
Sub-measure. 1.1.1. Industrial research and development work implemented by enterprises

Project promotion

As part of promotional activities related to the grant, SentiOne regularly publishes information about new research achievements and collaborations in the press and at scientific conferences, as well as at promotional talks and events.

Since 2021, we have been participating in international and national conferences where we discuss our research work and new opportunities for SentiOne, thanks to our cooperation with the National Centre for Research and Development, Wroclaw University of Technology, Gdansk University of Technology, or the University of Gdansk.

Information and promotional activities included regular updates on participation in scientific and technical conferences, as well as scientific publications and press reports on a dedicated SentiOne sub-site describing the SentiDeepFusion project.

An informational poster, prepared in accordance with promotion rules, is still displayed at the reception desk of the office, visible to both employees and visitors.

Documents related to the project, including project documentation (e.g., milestone reports), have been marked with European Funds logos and the European Union flag. Scientific publications have been provided with an appropriate formula (in English or Polish), indicating that the research funds came from the specified European Funds.

SentiOne and the project’s partners communicate the progress of project work and the results of project implementation through scientific publications, press releases, and events and lectures.

Conferences and appearances

Language, Data and Knowledge (LDK), 2021

From September 1st to 4th, 2021, the Language, Data, and Knowledge (LDK 2021) conference took place in Saragossa, Spain, featuring SentiOne’s speaker Michal Lew. Michal presented work created with Aleksander Obuchowski and Monika Kutyła, titled “Improving Intent Detection Accuracy Through Token Level Labeling”. Their proposed method aims to enhance the quality of an intention recognition classifier by utilizing fine-grained token-level labeling. This method of annotating the data has been demonstrated to improve the efficiency of the model and also offers better explanatory power for the model. The method for dataset annotation was employed in the SentiDeepFusion project during the development of the conversational corpus.

20th IEEE International Conference on Machine Learning and Applications (ICMLA), 2021

On December 13-15, 2021, the 20th IEEE International Conference on Machine Learning and Applications (ICMLA 2021) took place. During the event, Michal Lew presented the work of the SentiOne team (Michal Lew, Aleksander Obuchowski, Emilia Kacprzak, Agnieszka Pluwak) titled “Conversation Clustering Adaptation for Intent Recognition”. The presentation addressed the possibility of using unlabeled data to support training intent recognition classifiers. The proposed approach, Conversation Clustering Adaptation (CCA), reduces the need to produce a large (and thus costly) dataset for training a classifier while improving the accuracy of intent recognition by up to 12.4 percentage points. The method outperformed current state-of-the-art algorithms that use additional training data.

International Conference on Computational Science (ICCS), 2023

At a conference in Prague, Czech Republic, from August 3 to 5, 2023, the project’s partner – Wroclaw University of Technology – presented two talks on various aspects of natural language processing in emotion detection. 

In the first talk, the team consisting of: Bartlomiej Koptyra, Anh Ngo, Lukasz Radlinski, and Jan Kocon presented their talk titled “CLARIN-Emo: Training Emotion Recognition Models Using Human Annotation and ChatGPT”. In their work, they used ChatGPT both to produce new training data for emotion classification and to annotate the resulting dataset. The authors analyzed the possibilities and limitations of producing such training data for natural language processing models

In the second lecture, the research team comprising Jan Kocoń, Joanna Baran, Kamil Kanclerz, and Michał Kajstura gave a presentation titled “Differential Dataset Cartography: Explainable Artificial Intelligence in Comparative Personalized Sentiment Analysis”. Their method allows for the personalization of the sentiment analysis model for a specific use case. In the same presentation, the authors demonstrated Differential Data Maps, a method that enables the visualization of data in the context of the model, thus improving its explainability.

Infoshare 2023

On May 25, 2023, Bartosz Bazinski and Wojciech Luszczynski spoke at the Infoshare conference held in Gdansk. During their speech titled “Experiment: can generative and conversational AI be combined?” they talked about how to approach product change in the face of the emergence of new solutions on the market, and how to quickly schedule beta tests and attract the first interested customers. They also referred to potential applications of generative artificial intelligence in technology companies, especially the likes of SentiOne, which are developing their own artificial intelligence solutions.

During the Infoshare 2023 conference, it was also possible to visit SentiOne’s stand, where – among other things – the SentiDeepFusion grant was being promoted.