A Streamlit application that automate a User Support task for P.O with an NLP KeyBERT Text Analyzer

You are overloaded with support questions… Meanwhile the corporate hacking just shows you that people are reluctant to change and the User Feedback Loop looks more like beating a dead horse than a true and a constructive dialog… Then, maybe this post is for you even though it starts like a clickbait! Indeed, this is my personal answer to automate as much as possible the support to a fixed mindset users audience. Originally, this post is the result of a bundle merging 2 concerns in one: How to make efficient support? (less is more) and how to leverage on KeyBERT to make a Text Analyzer to read automatically the mails from support.

Two preliminary remarks that precise application’s scope and purpose:

  1. The application does not to send any email because the added value for this POC is in the ability to connect the 2 projects not to handle security, network, environment, deployment issues. Feel free to modify it and do this evolution if necessary. Security, network, environment, deployment issues are complexities far beyond this POC.
  2. Again for security reason, I cannot release the real templates that I have gathered for my own usage. Instead I have populated generic template, feel free to modify and adapt to our own use.

The project leverages on these libraries.

1. Question_1: How to make efficient support?

My first concern is “How can I optimize the support that I currently made as a PO?”. Making support can be really exhausting! So, it is better to rely on a brainless and turn-to-key solution so you can spend less time as possible on this task.

I know that support is a key concern but it can turn to harassment that will then change you surreptitiously into a customer support’s hater! I’m kidding but only halfway :). You must admit that sometimes making support is equivalent to provide a spare brain for people who are lazy to tackle their own difficulties… and you must be benevolent anyway!

That is the application’s purpose, at least bringing me an asset in continuous learning and at the same time delegating the worst to the machine and act brainless, answering to customers should be dead simple like the blink of an eye, that’s my purpose. Making support like a robot but with all the smoothness and politeness possible.

2. Question_2: How to perform Text Analysis with Keybert?

My second concern was more a data science oriented issue: On user’s support feedback, how can I leverage on NLP to perform text analysis with Keybert in order to sort the user’s content by topics therefore by keywords! For that purpose, I have selected KeyBERT to explore the texts.

KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.

Source: https://maartengr.github.io/KeyBERT/index.html

3. Answer_1 to Question_1: The project “AUTOMATE P.O. JOB SUPPORT’S DEMO”

This project enables to simply “CRUD” a set of mail’s templates that can be use to answer to any question regarding the application I am working on. I am currently a PO for a CMS but it can be applied to any application as far as you know your application’s scope.

You can find the working files in “AUTOMATE P.O. JOB SUPPORT’S DEMO” (004_automate_po_job_streamlit_sqlalchemy_example_database): 004_automate_po_job_streamlit_sqlalchemy_example_database

For that purpose, I am using mostly these 2 libraries:

3.1 Objective

The objective was to create a quick and dirty mailing User Support templates selector using a database for Knowledge management (KM).

When you “meta-describe” a mail as an object, you get the database SQLite description for fields and table! I added some tags to the table to enable a search ability on the database’s content. So, below here are the notes that I have used to design my database and the SQLite’s command to create it.

--dbName:  km_user_support
--tableName: user_support_mail_templates
 
--fieldsTableName:
--### id
--### filename
--### recipients
--### mail_object
--### mail_body
--### mail_search_tags
 
--SQLite command_1
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE user_support_mail_templates (
id_filename INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
filename VARCHAR,
recipients VARCHAR,
mail_object TEXT UNIQUE,
mail_body TEXT,
mail_search_tags VARCHAR
);
COMMIT;
 
--SQLite command_2
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE "user_support_mail_templates" (
"id_filename" INTEGER NOT NULL,
"filename" VARCHAR,
"recipients" VARCHAR,
"mail_object" TEXT UNIQUE,
"mail_body" TEXT,
"mail_search_tags" VARCHAR,
PRIMARY KEY("id_filename", AUTOINCREMENT)
);
COMMIT;

3.2 Note on mailing template building rules

Here are the notes that I have written before coding the application. It is mostly naming convention then defining the content that will be stored in the DB.

3.2.1 NOMEMCLATURE
Decomposing a typical mail, that is an abstract or class to describe the email as an object, gave me the fields for the database.

  • TITLE (filename)
  • DEST (recipients)
  • OBJECT (mail_object)
  • BODY (mail_body)
  • TAGS (mail_search_tags)

3.2.2 PATTERN
The idea is to create a pattern for the email template’s object to normalize it.

# to create the mail object template
[nb_increment]_[user_area]_[to_recipient]_[content_mail_type]
 
# EXAMPLES
### 001_support_filename
### 002_support_filename
### 003_support_filename
### 004_support_filename
### 005_support_filename

3.2.3 CONTENT
I have put some fake text inside the email template’s body. Check config_values/values_conf.py to get a glimpse on the values that I am using

3.2.4 OTHER DIRECTORIES
The other directories in the project are just testimonials of my exploratory work:

  1. 002_pythonspot
  2. 003_automate_po_job

4. Answer_2 to Question_2: The project “KeyBERT Rough Text Analyzer”

I am not even a real data scientist nor a real developer but I am sure to know to capture the essence of some other projects and to tinker them to fit my own purposes.
So, I have designed the application with KeyBERT based on these 2 projects. I have sample some existing application structure, in particular 2 remarkable applications: one from charlywargnier and the other from ahmedbesbes

The “skeleton” for an application with KeyBERT should work with those parameters and their equivalent in Streamlit.

  1. st.radio :: choose your model e.g DistilBERT (Default) or Flair
  2. st.slider :: choose number of keywords/keyphrases
  3. st.number_input :: Choose Minimum Ngram
  4. st.number_input :: Choose Maximum Ngram
  5. st.checkbox :: Select Remove stop words
  6. st.checkbox :: Select Use MMR
  7. st.slider :: Keyword diversity (MMR only)
  8. st.text_area :: Paste your text below (max 500 words)
  9. st.button :: Paste your text below (max 500 words)

I intend also to use these 2 objects that are often in my Streamlit application.

st.expander
st.help(obj)

5. Focus on keyBERT: a sesame to text meaning disclosure!

Let’s be simplistic! The basic idea is to leverage on automatic keyword generation. This principle is everywhere for multipurpose when you are dealing with texts from Chat solution to any text analysis solution. So, in my case, it will be analyze my users support’s heavy burden.

Bert is one of the best heaviest champion in NLP! keyBERT the greatest carries, as a library, many models with among them my favorite Spacy. So, keyBERT is great for tinkering as there are tremendous number of parameters and it is embedding models such as DistilBERT or Flair. The main drawback is to know exactly what the parameters are about when you are not a specialist.

Personally, I like to have results every time I am doing something! So, I use these 2 models that work for different languages (‘en’, ‘it’, ‘fr’, ‘es’, ‘ru’) and give me back good-enough results in term of Keywords.

Load pretrained SentenceTransformer: distilbert-base-nli-mean-tokens
Load pretrained SentenceTransformer: distiluse-base-multilingual-cased-v1

Top N results
You can choose the number of results to be displayed. Between 1 and 30, the default number is 10.

Min/Max Ngrams
You can choose the minimum and maximum values for the ngram range.
This sets the length of the resulting keywords/keyphrases.
To extract a set of single keywords only, set the ngram range to (1, 1)
To extract keyphrases, set the minimum ngram value to 2. The maximum ngram value can be set to 2 or higher, depending on the number of words you would like to see in each keyphrase.

In this post I am going to talk about N-grams, a concept found in Natural Language Processing ( aka NLP). First of all, let’s see what the term ‘N-gram’ means. Turns out that is the simplest bit, an N-gram is simply a sequence of N words. For instance, let us take a look at the following examples.

San Francisco (is a 2-gram)
The Three Musketeers (is a 3-gram)
She stood up slowly (is a 4-gram)

Source: https://blog.xrds.acm.org/2017/10/introduction-n-grams-need/

Check Stop Words
The most straightforward definition for stopwords is all the words you want to exclude from your NLP analysis.

I have extended the system to not only have Stop Words in English but to other European languages such as: Italian, French, Spanish and Russian.

More precisely, in any language, you have redundant words that does not bring poor insights on the text understanding. For instance, in English vocabulary, there are many words like “I”, “the” and “you” that appear very frequently in the text but they do not add any valuable information for NLP operations and modeling. These words are called stopwords and they are almost always advised to be removed as part of text preprocessing.

You can add or create your own stop words list like for instance with offending words, very useful when you investigate consumer feedback in Chats or in a Forums!

In my case, I have set files in a directory called stop words with stop words files found on the web for each language except english : stopwords_es.txt, stopwords_fr.txt, stopwords_it.txt, stopwords_ru.txt.

For english language , as the model called is “distilbert-base-nli-mean-tokens”, the stop Words list is set directly as a parameter when the model is called.

You have other parameter that you should know but the best is to check directly the website. You can find much more information on the official website of KeyBERT at https://maartengr.github.io/KeyBERT/

Use MMR (Maximal Margin Relevance)
You can use Maximal Margin Relevance (MMR) to diversify the results. It creates keywords/key phrases based on cosine similarity.

Diversity
The higher the setting, the more diverse the keywords. Note that the *Keyword diversity* slider only works if the *MMR* checkbox is ticked.

6. Final Thought: Wrap-it-up

At the end, I want to see a all-in-one. So, I decided to merge the 2 projects: “KeyBERT Rough Text Analyzer” and “AUTOMATE P.O. JOB SUPPORT’S DEMO”. Thanks to Streamlit, it has not been so difficult. You can check the result at the address below:

  • The merge of the 2 projects: “KeyBERT Rough Text Analyzer” and “AUTOMATE P.O. JOB SUPPORT’S DEMO”
    (all_in_one_automate_po_job_demo_support_keybert)
    automate_po_job_demo_support

Conclusion: Like always, I have the feeling that is maybe overkill to forge as a P.O (Product Owner) my own tools. But, I have the persistent impression that the Product Owner’s job scope may probably change radically or disappear purely and simply on some aspects due to IA. In any case, as far as user support is concerned, I wish, with all my heart, to have it disappeared! Last thing, thanks again to Streamlit because by using it, I am force to think in term of GUI so it spreads good practices in terms of coding especially by refactoring the code when it is possible (DRY).

7. Videos

3 additional videos to tackle this post

  • Part 1 – The project “AUTOMATE P.O. JOB SUPPORT’S DEMO” (automate_po_job_demo_support)
  • Part 2 – The project “KeyBERT Rough Text Analyzer” (discovering_bert_and_keybert)
  • Part 3 – The merge of the 2 projects: “KeyBERT Rough Text Analyzer” and “AUTOMATE P.O. JOB SUPPORT’S DEMO” (all_in_one_automate_po_job_demo_support_keybert)

More infos