The SentiCognitiveServices Project

Product and brand image analysis automation

Project SentiCognitiveServices - EU RESEARCH PROJECT

Project POIR.01.01.01-00-0806/16, “SentiCognitiveServices – the next generation of marketing automation and social media listening services based on artificial intelligence”

Project description

The social media analytics software market is projected to increase from $1.6 billion in 2015 to $5.4 billion in 2020. Currently available social listening solutions provide users only with simple analytics based on techniques such as word counting and rudimentary sentiment detection based on dictionaries and AI algorithms. As a result, answering the common question – ”what does the internet think of our product” – requires intensive manual labour by analysts for each individual case. The situation is not helped by the fact that each analysis takes into account millions of statements.

The project is dedicated to inventing technologies necessary for the full automation of preparing such analyses. To that end:

  1. Natural / colloquial language processing research will be conducted.
  2. A data extraction layer will be developed to retrieve information from online statements; this will utilise methodologies based on machine learning and a heuristic approach.
  3. A deduction layer based upon various data extraction models will be developed; these models will include the aspect model (aspects, objects and their features, product comparisons), PR-crisis content detection, text summaries, named entity recognition and detection of emotions and user interests.

Following these steps, the following application modules will be developed:

  1. Automatic report creation
  2. Crisis warning
  3. Suggesting answers to client inquiries

The results of all activities performed as part of the project will be utilised in new products, as well as the existing SentiOne solution. As a result, the solution’s developer will be ready to offer a product which answers client requirements confirmed by letters of intention.

Qualified expenses (project value): 11 699 568,70 PLN
Funding amount: 8 682 464,57 PLN

  • Funding request number: POIR.01.01.01-00-0806/16
  • Implementation period: 07/2017 – 06/2020
  • Activity: 1.1 Company R&D projects
  • Sub-activity: 1.1.1 Industrial research conducted by companies

Project promotion

As part of its promotional activities, SentiOne regularly publishes information about new research developments and collaborations in the press and at scientific conferences, as well as during promotional talks and events.

In the years 2017 – 2019, we participated in several international conferences, during which we talked about our research developments and new opportunities made available to SentiOne through our collaborations with NCBiR, the Wrocław University of Science and Technology, and the AGH University of Science and Technology.

Conferences and appearances

OVHCloud Summit, Paris, 2019

On October 10th, 2019, the OVHCloud Summit 2019 conference took place. During the event, Katarzyna Bultrowicz, a Research Engineer from SentiOne presented a case study describing the update performed on a production database cluster, required to use the Percolator ElasticSearch mechanism. This tool allows for the real-time filtering of a datastream, which potentially allows for the application of various semantic models for mention analysis, based upon their content. She also shared her observations and tips regarding migrating similarly large datasets based upon the team’s and her own experiences.

VII Cosmetics Industry Forum, Warsaw, 2019

The VII Cosmetics Industry Forum took place in Warsaw on October 3rd, 2019. It was dedicated to the latest trends and the most important developments within the industry, as well as gathering suppliers and providers of cosmetic services. During the event, Jagoda Prętnicka, SentiOne’s head of PR and marketing insights spoke about technological innovations driving customer choices, such as mention analysis models based on AI or chatbots suggesting products. In addition, she talked about effectively utilising the potential within new technologies and took part in a debate about the future of the cosmetics industry, during which the most important trends shaping the market were discussed.

RANLP, Varna, 2019

The Recent Advances in Natural Language Processing conference took place between August 31st and September 6th, 2019 in Varna, Bulgaria. During the event, BEng. Wiktor Walentynowicz, PhD, BEng. Maciej Piasecki and PhD Marcin Oleksy from the Wrocław University of Technology presented the results of their research into building a morphosyntactic tagger for parsing online statements in colloquial Polish. Their solution is characterised by a high degree of accuracy: 90,14%. The RANLP conference is a cyclical event featuring workshops and talks by leading natural language processing experts.

The tagger was trained on the SentiOne dataset – the first Polish dataset containing everyday, colloquial language, built from statements from the SentiOne database, morphosynctatically tagged as part of the SentiCognitiveServices project, available under an open license in the Clarin repository.

SEP ScaleUP Summit London and SaaStr Europa Paris, 2019

Bartosz Baziński, COO of SentiOne, appeared at two conferences in June 2019. At the SEP ScaleUp Summit, held at the London Stock Exchange building, investors and entrepreneurs discussed current trends in technology and business. As part of the ongoing discussions at the event, Bartosz presented the SentiCognitiveServices project and its applications in various industries, such as banking and services.

At the SaaStr Europa 2019 conference in Paris, the SentiCognitiveServices project was presented during a debate discussing the commercial applications of R&D projects.

University of Gdańsk workshops, April 2019

On April 25th, 2019, by invitation of PhD Joanna Redzimska, Jakub Klimek – an engineer in the R&D department at SentiOne – conducted workshops at the University of Gdańsk dedicated to the theory of sentiment analysis for English Literature students specialising in Natural Language Processing.

The workshop raised topics such as text, sentence and phrase analysis and the concept of polarity. The students familiarised themselves with two text tagging tools, BRAT and Inforex, as well as attempted to analyse example statements. These exercises provided an introduction into a discussion about the importance of guidelines established by annotators, which helped the students achieve better accuracy results in further attempts.

The workshops proved popular among participants, who expressed an interest in internships offered as part of the SentiCognitiveServices project, as well as in other topics related to natural language processing.

The SentiCognitiveServices project utilises CLARIN network tools, especially the Inforex tagging application, which contributes to that tool’s development.

A longer publication by Michał Marcińczuk, the author of Inforex, is scheduled to be published in the near future; it will detail newly introduced features, such as the morphosyntactic tagger utilised by SentiCognitiveServices for tagging user-generated content.

beIT Gdańsk, 2019

On April 21st, 2019, Olga Springer, SentiOne’s Head of Product conducted a workshop entitled “what should a product manager know about AI?” at the beIT Software Engineering Conference hosted at the Gdańsk University of Technology.

During her preceding talk, participants learned about the wide range of AI applications in fields such as processing online opinions, sentiment analysis and providing answers through chatbots. Olga also shared her experience and conclusions about the practical aspects of R&D work in startups.

The workshop exercise aimed to calculate the business value of implementing cutting-edge NLP technology for an example client. The workshop also stressed the importance of cooperation between companies and universities and conducting grant projects in small and medium-sized enterprises.

AI & Big Data Congress, Warszawa, 2019

As part of the AI and Big Data Congress held in Warsaw on March 12-13th, 2019, the “Commerce of the Future” debate was held. During the debate, Bartosz Baziński, SentiOne’s COO, presented the practical applications of the results of the SentiCognitiveServices project. Describing the developments made, he focused mostly on the changes to the retail industry and its ongoing move towards digital operations, from historical data collection to AI deployment and machine learning. The debate also raised the topic of utilising a client’s full potential through omnichannel and AI. The participants included Krisztián Brenkus, CRM & Data Director, Auchan Retail; PhD Mariusz Cholewa, chairman of the board, BIK oraz Jarosław Góra, Co-Founder, Deep.Bi.

INFOSHARE Gdańsk, 2018

InfoShare is the largest tech conference in Central/Eastern Europe, featuring more than 6,000 participants in 2018. SentiOne was one of the partners of the conference. During the event, two SentiOne representatives gave two talks:

  1. Customer Service is the New Marketing
  2. AI, NLP, and Machine Learning – the lingua franca of the twenty-first century

Both talks were directly related to the SentiCognitiveServices project. Additionally, the representatives talked about the cooperation conducted as part of the project on the event’s trade floor.

AI4U Munich, 2018

As part of the project’s promotional activities, Michał Brzezicki of SentiOne held a talk during the AI4U conference in Munich held in June, 2018. The talk, entitled “Usage of Artificial Intelligence for Improving Customer Support and Brand Recognition” described the effects and developments of the SentiCognitiveServices project.

AI & Big Data Congress, Warszawa, 2018

During the Big Data Congress conference, held on April 18-19th, 2018, Bartosz Baziński, SentiOne’s COO and co-founder spoke about customer service automation utilising AI on text channels. Two days of engaging panels attracted over 620 participants all of whom were exposed to the SentiCognitiveServices project.

AAAI Conference, New York City, 2020

In January 2020, Aleksander Obuchowski, a member of SentiOne’s research and development team, will participate in the AAAI conference in New York. He will speak about a publication describing an innovative neural network architecture devoted to user sentiment detection. This architecture is based on transfer connection, used currently in the best language models, alongside recently introduced capsule neural networks. A model based upon the described architecture achieves state of the art results on three public datasets.

Publications

In November 2019, an article by Michał Brzezicki, SentiOne’s CTO, appeared on the money.pl website. This article concerned SentiOne’s cooperation with the National Research and Development Centre, as well as research into natural language processing and deep learning. Michał pointed out the shift in the company’s strategy and a pivot into operational development and work conducted on second-generation NLU bots.

Two publications related to SentiCognitiveServices appeared in September 2019. “Inforex — a Collaborative System for Text Corpora Annotation and Analysis Goes Open” concerned the changes introduced to Inforex, a text corpus management system. The second, entitled “Tagger for Polish Computer Mediated Communication Texts”, described the development process of a tagger dedicated to handling user-generated text. Both publications can be found in RANLP 2019 – Natural Language Processing in a Deep Learning World.

SentiCognitiveServices in local media

The livepomerania.com website published a commentary by Olga Springer, SentiOne’s Head of Product, explaining how research and development work conducted as part of the SentiCognitiveServices grant tie into the SentiOne AI-powered tool family.

SentiCognitiveServices in Grow with Tech

SentiOne appeared in the fourth edition of the Growth With Tech magazine listed in an article detailing companies working with AI technology in the Tricity area. The article mentioned the advanced NLU engines created as part of the SentiCognitiveServices project, which utilise natural language processing and machine learning.

“New Technologies” in Rzeczpospolita, October 2017

In the October 17th, 2017 edition of the national daily, “Rzeczpospolita”, an article was published which described SentiOne and the technologies used by the company. It also described the brand’s cooperation with the National Research and Development Centre and the beginnings of the SentiCognitiveServices project and the opportunities automatic data processing can open.

The results of the first stage of the project available in the Clarin.EU repository.

The first stage of the SentiCognitiveServices project resulted in the creation of the largest corpus of the Polish colloquial language, morphosyntactically tagged in accordance with the National Corpus of the Polish Language guidelines.

This task was completed based on detailed guidelines established by a team of linguists at the Wrocław University of Technology.

The corpus is made out of 7,561 documents (around 400 000 segments) collected from various sources. It is available under the Creative Commons license in the CLARIN network database at https://clarin-pl.eu.

The statements are authentic and present characteristics common to statements made by internet users, such as specific spelling patterns or errors, all of which were normalised and segmented by a team of linguists.

This corpus served to train the SentiOne morphosyntactic tagger for colloquialisms. It allows the analysis of online statements to identify parts of speech. It’s an important NLP tool, helpful in building data extraction systems. At the same time, it’s the first module dedicated to processing colloquial Polish language.

The tagger was also made available in the CLARIN.eu repository under an open license.

First UGC corpus for the Polish language


As part of the second stage of the SentiCognitiveServices project, the first corpus of user-generated data in Polish was assembled, as described by a publication by PhD Agnieszka Pluwak from the SentiOne research and development team, as well as specialists from the Wrocław University of Technology, MSc, BEng. Arkadiusz Janz, Łukasz Kopociński, and PhD, BEng. Maciej Piasecki.
The publication details the process behind assembling the dataset of user-generated statements, manually annotated by a team of linguists. This Colloquial Polish Language Corpus is not only one of the largest sets of this kind for the Polish language, but also for any language – it contains 7,561 texts, or 402,840 tokens. This dataset can be used, among other applications, for morphosyntactic tagging or in tools for lemmatization of non-standard Polish language (that is to say, Polish marred with typos, punctuation errors, idioms, et cetera).
This corpus was successfully used to develop the CMC Tagger – a morphosyntactic tagger dedicated for online statements in Polish.

Product-brand relation classifier


In another publication authored by the SentiOne R&D team in collaboration with the Wrocław University of Technology, a product-brand relation classifier was presented. The publication, written by PhD Marcin Oleksy, PhD Agnieszka Pluwak, Wiktor Walentynowicz and PhD, BEng. Maciej Piasecki describes a method allowing for information retrieval about relations between a brand and given products, as well as a corpus annotation methodology.
The extraction model presented in the article is able to determine entities representing brand and product names and to decide whether or not they are bound by a brand-product relation (or, in simpler terms, whether a product was manufactured by a certain company).
A relation classifier is a tool with practical applications in business: it allows the commercial monitoring of social media and the internet at large by finding opinions not only about a brand, but also its products.

The project is financed through the European Regional Development Fund

Fundusze Europejskie Inteligentny Rozwój, Rzeczpospolita Polska, Unia Europejska Europejski Fundusz Rozwoju Regionalnego logos