Quick overview about using NLP for a CMS Customer Support (FAQs turn to a Chatbot) or CMS editorial features for Journalist (Keywords Extraction) using spaCy, Rake, TensorFlow, Pytorch

After facial recognition, I am tackling language issues with Python. Indeed, after image, the other ingredient for a post is mostly text! As a CMS “manufacturer” or PO, I was wondering what advantages I can withdraw from NLP. Concretely, it means exploring and learning Python to improve both user support (FAQs turn as a Chatbot, analyzing User Feedback…) but also think about some editorial features especially with the help of Natural Language Processing (NLP).

I am wrestling with the subject for too long because there are tons of libraries and tutorials introducing to Python and NLP! Looking for python is an heavy trend. Apparently, Google users in America have searched for Python more often than for Kim Kardashian. So, “creating a chatbot in Python” has become the typical quoted example like “creating a blog” or “hello world” in other language!

Source: https://www.economist.com/science-and-technology/2018/07/19/python-has-brought-computer-programming-to-a-vast-new-audience

Like I said at the beginning of this post. The starting idea was simple, how can I aleviate real-world tasks such as:

  1. Improve the user feedback loop (monitoring user feedback or converting static FAQs to a modest User Support Conversational Agent).
  2. Text understanding to enable meaningfull keywords extraction or text summary for instance.

I found some very contrived examples and some more advanced ones like always. Even though, these examples are oversimplified, there are still caveats for less technical readers, including me, especially when it comes to concepts pertaining to linguistic eg stemming, tokenization, tokenizer, bag of words or Convolutional Neural Network.

I invite to check these very intuitive videos that are good introduction to NLP.

These videos are giving some enlightening on NLP’s concepts such as stemming, tokenization, tokenizer or bag of words or even some explanations on different type of IA’s network such as Convolutional Neural Network and the way to use it.

Here is a posts’digest to start with NLP oriented around 2 basic usages, that can be implemented in a CMS: Practical Use Cases in a CMS’s support and simple techniques to extract Keywords or even “slice” a post.

Chronologically, the very first library, I explored was the famous NLPs librairie, NTLK. Then I discovered Pytorch, made by Facebook and then Spacy. There will be certainly a more specific article on Spacy because I really like Spacy for its accessibility both in tutorials and in its core values. After all this reading, I selected few articles that were illustrating, at least, some of my personal interests for NLP. This post is a quick overview of that exploration.

The source code is available on my GitHub account and I am using my own mac plus anaconda to deal and install all the require libraries (https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction). All libraries have been installed with the help of Anaconda.

1. Keyword Extraction

A beginner’s guide to keyword extraction with natural language processing (article_1_keyword_extraction_nlp)

A good usecase for support where you parse a unique user feedback file and retrieve core informations with NLP. This usecase leverage on a bunch of librairies such as Panda, Scipy, Seaborn, scikit-learn and for sure NLTK. It parses a huge document in .tsv format (Tab-separated values).

Source: https://www.andyfitzgeraldconsulting.com/writing/keyword-extraction-nlp/

Source: https://github.com/andybywire/nlp-text-analysis

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_1_keyword_extraction_nlp

Requirement to run the scripts

# look in in https://anaconda.org/conda-forge/ to find the command.
# pandas for data analysis and manipulation 
conda install -c anaconda pandas
# nltk, statistical natural language processing 
conda install -c anaconda nltk
# matplotlib is a python 2D plotting library
conda install -c anaconda matplotlib
# seaborn visualization library based on matplotlib
conda install -c anaconda seaborn
# pillow imaging Library, image processing capabilities to your Python interpreter.
conda install -c anaconda pillow
# an imaging library to create word cloud visualizations
conda install -c conda-forge wordcloud
# install sklearn or scikit-learn
conda install -c anaconda scikit-learn

Source: https://www.andyfitzgeraldconsulting.com/writing/keyword-extraction-nlp/

NLP keyword extraction tutorial with RAKE and Maui (article_2_keyword_extraction_nlp_rake)

For me, only first part was interesting, it shows how to use RAKE which stands for Rapid Automatic Keyword Extraction. RAKE extracts keywords that should describe the main topics expressed in a document.

Source: https://www.airpair.com/nlp/keyword-extraction-tutorial

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_2_keyword_extraction_nlp_rake

Extract Keywords Using spaCy in Python (article_3_keyword_extraction_nlp_spacy)

This article from Ng Wai Foong and some other examples from the great official spaCy documentation show how to quickly get to grip with Spacy.

The script extracting keywords with Spacy is straightforward like the other article from this guy Ng Wai Foong.

Source: https://medium.com/better-programming/extract-keywords-using-spacy-in-python-4a8415478fbf

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_3_keyword_extraction_nlp_spacy

Miscellaneous examples with spaCy (article_4_miscellaneous_examples_nlp_spacy)

Some miscellaneous linguistic scripts using spaCy. There is much more on their github account and the documantion is terrific.

Source: https://github.com/explosion/spaCy

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_4_miscellaneous_examples_nlp_spacy

Scraping Post

Newspaper: Article scraping & curation (article_5_playing_with_newspaper_post_scraping_curation)

A simple attemps with a librairie newspaper. The Python librairie gives the ablility to slice up any post online. In the script, as an example, I am using one of my blog’s post.

Source: https://newspaper.readthedocs.io/en/latest/

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_5_playing_with_newspaper_post_scraping_curation

If you need to check the import of newspaper

$ python
>>> import newspaper
>>> newspaper.__version__
>>> exit()


ChatBot With PyTorch – NLP And Deep Learning (article_6_chatbot_with_pytorch)

We left the Keyword Extraction for ChatBot. Turning my FAQ to a ChatBot with the help of Pytorch and NLTK. It is a very intuitive tutorial and the videos are making the rest.
Certainly, I was not rapt in ecstasy by the chatbot ability but there is a lot of promises for Chatbots, supposed to be handling fairly complex conversation with humans and so using a lot of Natural Language Processing techniques in order to understand the human’s requests.

Source: https://www.python-engineer.com/videos/chatbot-pytorch/

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_6_chatbot_with_pytorch

Build Your First Chatbot in Python (article_7_chatbot_with_tensorflow)

A different ChatBot Build on TensorFlow from a .txt file.

Source: https://medium.com/x8-the-ai-community/build-your-first-chatbot-in-python-334247814900

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_7_chatbot_with_tensorflow

Chatbot tutorial by Matthew Inkawhich (article_8_chatbot_tutorial_pytorch)

I found a more advanced Chatbot tutorial with Pytorch. Be careful with the n_iteration value because it requires a lot of space disk! I was forced to downsize the training but then the chatbot sucks a little bit. Anyway, the example is great.


My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_8_chatbot_tutorial_pytorch

Text Summarization Using spaCy in Python (article_9_text_summarization_using_spacy)

A second article from Ng Wai Foong. It is about Text Summarization with TF-IDF (Term Frequency-Inverse Data Frequency). It leverages on Spacy and the result is immediate

Source: https://medium.com/better-programming/extractive-text-summarization-using-spacy-in-python-88ab96d1fd97

My files: https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction/article_9_text_summarization_using_spacy

Some critics about IA

Let’s step back a little bit to think a minute about IA consequences. These IA tools exerts an undeniable fascination. Why? These are new tools that actually begin to think and act on its own. The idea that these tools will make decisions and undertake actions on their own is fascinating and scary at the same time. I was wondering if any critical thinking existed towards the deafening consensus on AI?
I found some opponents, on a philosophical point of view. Even though, IA fanatics report that the AI promise is to “Humanize the machine, not mechanize the User”, the main critic remains the AI’s “injunctive power”. Combined with consent, it makes an unstoppable combination to turn us mankind into passive and obedient sheeps! By the way, the GAFAS, that promote IA, never really assumed they were the bad guy.

Indeed, IA can been seen as the ultimate market achievement where, reduced as consumers, we only take decisions with utilitarian goals, “obeying” to IA.

Regarding NLP, the disturbing thing is the familiar form that this injunction takes. The Chatbot speaks to you, the NLP writes and advises you with your own words… This is step forward to a very persuasive soft power. So, is the very idea of rebelling still even exist as it sounds ludicrous, to fight with a friend!

What I mostly remember from this reading:
– IA is a threat to humanity, especially our free will.
– IA is the ultimate version of “Invisible hand”, so criticizing IA seems to be the way to “burn down” the system aka capitalism, GAFAS (Facebook, Google, Amazon… etc. that are mostly behind the IA libraries and expect something in return: your datas so you can be profiled.

Even though it is nowadays almost impossible to avoid IA, that’s always good to read opinions against the mainstream way of thinking so you’ll be aware of the potential IA’s threats!

  • L’ Intelligence artificielle ou l’enjeu du siècle: Anatomie d’un antihumanisme radical de Eric Sadin (French)
  • Peter W. Singer On Why His ‘Robot Revolution’ Is Inevitable
  • What’s next? How can I use NLP?

    I wonder more and more if the target has not become to even drop PHP to build a web application. Is refactoring a PHP legacy Code in Python is an option?

    Indeed, building a web application, I do not even talk about a website seems to be easy nowadays. You can gather an effective SPA (Single Page Application) in a very short time but providing meaningful and advanced features for a CMS is much trickier!

    To be totally transparent, a simple question is spinning around in my head: How can I add some “intelligent” functionalities, using these Python libraries, to an existing CMS made in PHP (Laravel or Symfony for instance)? Apparently, the way seems to build a separate API in Python that will brigde with PHP!

    Like I said in my previous post, these NLP libraries are not only enabling new tasks to be made but these libraries can even carry out tasks like a real human such as me a P.O for a Backoffice! Great, I am outsourcing myself.

    Using Anaconda

    A reminder for useful commands with Anaconda

    # Check Anaconda installation
    conda --version
    # Update conda
    conda update -n base -c defaults conda
    # Create an environment
    # environment with Python 3.5
    conda create --name myEnvironmentOne python=3.5
    # launch a script named 006_nltk_cookbook_test.py
    --- python 006_nltk_cookbook_test.py
    # Get into the env named myEnvironmentOne
    conda activate myEnvironmentOne
    # get out from an env
    conda deactivate
    # By default you are in the base env no need to activate
    # conda activate base

    In conclusion

    This reading gave me an overview of the Python’s possibilities in terms of text understanding. NLP seems to be progressing more and more every day and remains fairly accessible if you do not pretend to be a specialist! It would be a shame to do without it while waiting to see progress in text’s generation with BERT for example.

    To remove doubts about the NLP’s potential, this post underwent both a keyword extraction and summarization operations with the help of 2 scripts given here as example. The result is not bad! It is a true crutch for a rookie journalist like me! You can see below the result.

    # result for post tags
    python, nlp, spacy, ia, chatbot
    # summarization result
    These videos are giving some enlightening on nlp's concepts such as stemming, tokenization, tokenizer or bag of words or even some explanations on different type of ia’s network such as convolutional neural network and the way to use it.here is a posts'digest to start with nlp oriented around 2 basic usages, that can be implemented in a cms: practical use cases in a cms's support and simple techniques to extract keywords or even "slice" a post. Rake extracts keywords that should describe the main topics expressed in a document. An article from ng wai foong and some other examples from the great official spacy documentation show how to quickly get to grip with spacy. Concretely, it means exploring and learning python to improve both user support (faqs turn as a chatbot, analyzing user feedback...) but also think about some editorial features especially with the help of natural language processing (nlp).

    Read more