Quick overview about using NLP for a CMS Customer Support (FAQs turn to a Chatbot) or CMS editorial features for Journalist (Keywords Extraction) using spaCy, Rake, TensorFlow, Pytorch
After facial recognition, I am tackling language issues with Python. Indeed, after image, the other ingredient for a post is mostly text! As a CMS “manufacturer” or PO, I was wondering what advantages I can withdraw from NLP. Concretely, it means exploring and learning Python to improve both user support (FAQs turn as a Chatbot, analyzing User Feedback…) but also think about some editorial features especially with the help of Natural Language Processing (NLP).
I am wrestling with the subject for too long because there are tons of libraries and tutorials introducing to Python and NLP! Looking for python is an heavy trend. Apparently, Google users in America have searched for Python more often than for Kim Kardashian. So, “creating a chatbot in Python” has become the typical quoted example like “creating a blog” or “hello world” in other language!
Like I said at the beginning of this post. The starting idea was simple, how can I aleviate real-world tasks such as:
- Improve the user feedback loop (monitoring user feedback or converting static FAQs to a modest User Support Conversational Agent).
- Text understanding to enable meaningfull keywords extraction or text summary for instance.
I found some very contrived examples and some more advanced ones like always. Even though, these examples are oversimplified, there are still caveats for less technical readers, including me, especially when it comes to concepts pertaining to linguistic eg stemming, tokenization, tokenizer, bag of words or Convolutional Neural Network.
I invite to check these very intuitive videos that are good introduction to NLP.
These videos are giving some enlightening on NLP’s concepts such as stemming, tokenization, tokenizer or bag of words or even some explanations on different type of IA’s network such as Convolutional Neural Network and the way to use it.
- SPACY’S ENTITY RECOGNITION MODEL: incremental parsing with Bloom embeddings & residual CNNs
https://www.youtube.com/watch?v=sqDHBH9IjRU - PyTorch Beginner Tutorials from my YouTube channel.
ttps://github.com/python-engineer/pytorchTutorial - Advanced NLP with spaCy · A free online course
https://course.spacy.io/en/
Here is a posts’digest to start with NLP oriented around 2 basic usages, that can be implemented in a CMS: Practical Use Cases in a CMS’s support and simple techniques to extract Keywords or even “slice” a post.
Chronologically, the very first library, I explored was the famous NLPs librairie, NTLK. Then I discovered Pytorch, made by Facebook and then Spacy. There will be certainly a more specific article on Spacy because I really like Spacy for its accessibility both in tutorials and in its core values. After all this reading, I selected few articles that were illustrating, at least, some of my personal interests for NLP. This post is a quick overview of that exploration.
The source code is available on my GitHub account and I am using my own mac plus anaconda to deal and install all the require libraries (https://github.com/bflaven/BlogArticlesExamples/tree/master/python_nlp_explorations_chatbot_keywords_extraction). All libraries have been installed with the help of Anaconda.
1. Keyword Extraction
A beginner’s guide to keyword extraction with natural language processing (article_1_keyword_extraction_nlp)
A good usecase for support where you parse a unique user feedback file and retrieve core informations with NLP. This usecase leverage on a bunch of librairies such as Panda, Scipy, Seaborn, scikit-learn and for sure NLTK. It parses a huge document in .tsv format (Tab-separated values).
Source: https://www.andyfitzgeraldconsulting.com/writing/keyword-extraction-nlp/
Source: https://github.com/andybywire/nlp-text-analysis
Requirement to run the scripts
# look in in https://anaconda.org/conda-forge/ to find the command. # pandas for data analysis and manipulation conda install -c anaconda pandas # nltk, statistical natural language processing conda install -c anaconda nltk # matplotlib is a python 2D plotting library conda install -c anaconda matplotlib # seaborn visualization library based on matplotlib conda install -c anaconda seaborn # pillow imaging Library, image processing capabilities to your Python interpreter. conda install -c anaconda pillow # an imaging library to create word cloud visualizations conda install -c conda-forge wordcloud # install sklearn or scikit-learn conda install -c anaconda scikit-learn |
Source: https://www.andyfitzgeraldconsulting.com/writing/keyword-extraction-nlp/
NLP keyword extraction tutorial with RAKE and Maui (article_2_keyword_extraction_nlp_rake)
For me, only first part was interesting, it shows how to use RAKE which stands for Rapid Automatic Keyword Extraction. RAKE extracts keywords that should describe the main topics expressed in a document.
Source: https://www.airpair.com/nlp/keyword-extraction-tutorial
Extract Keywords Using spaCy in Python (article_3_keyword_extraction_nlp_spacy)
This article from Ng Wai Foong and some other examples from the great official spaCy documentation show how to quickly get to grip with Spacy.
The script extracting keywords with Spacy is straightforward like the other article from this guy Ng Wai Foong.
Source: https://medium.com/better-programming/extract-keywords-using-spacy-in-python-4a8415478fbf
Miscellaneous examples with spaCy (article_4_miscellaneous_examples_nlp_spacy)
Some miscellaneous linguistic scripts using spaCy. There is much more on their github account and the documantion is terrific.
Source: https://github.com/explosion/spaCy
Scraping Post
Newspaper: Article scraping & curation (article_5_playing_with_newspaper_post_scraping_curation)
A simple attemps with a librairie newspaper. The Python librairie gives the ablility to slice up any post online. In the script, as an example, I am using one of my blog’s post.
Source: https://newspaper.readthedocs.io/en/latest/
If you need to check the import of newspaper
$ python |
>>> import newspaper >>> newspaper.__version__ >>> exit() |
ChatBot
ChatBot With PyTorch – NLP And Deep Learning (article_6_chatbot_with_pytorch)
We left the Keyword Extraction for ChatBot. Turning my FAQ to a ChatBot with the help of Pytorch and NLTK. It is a very intuitive tutorial and the videos are making the rest.
Certainly, I was not rapt in ecstasy by the chatbot ability but there is a lot of promises for Chatbots, supposed to be handling fairly complex conversation with humans and so using a lot of Natural Language Processing techniques in order to understand the human’s requests.
Source: https://www.python-engineer.com/videos/chatbot-pytorch/
Build Your First Chatbot in Python (article_7_chatbot_with_tensorflow)
A different ChatBot Build on TensorFlow from a .txt file.
Source: https://medium.com/x8-the-ai-community/build-your-first-chatbot-in-python-334247814900
Chatbot tutorial by Matthew Inkawhich (article_8_chatbot_tutorial_pytorch)
I found a more advanced Chatbot tutorial with Pytorch. Be careful with the n_iteration value because it requires a lot of space disk! I was forced to downsize the training but then the chatbot sucks a little bit. Anyway, the example is great.
Check
https://pytorch.org/tutorials/beginner/chatbot_tutorial.html
Text Summarization Using spaCy in Python (article_9_text_summarization_using_spacy)
A second article from Ng Wai Foong. It is about Text Summarization with TF-IDF (Term Frequency-Inverse Data Frequency). It leverages on Spacy and the result is immediate
Some critics about IA
Let’s step back a little bit to think a minute about IA consequences. These IA tools exerts an undeniable fascination. Why? These are new tools that actually begin to think and act on its own. The idea that these tools will make decisions and undertake actions on their own is fascinating and scary at the same time. I was wondering if any critical thinking existed towards the deafening consensus on AI?
I found some opponents, on a philosophical point of view. Even though, IA fanatics report that the AI promise is to “Humanize the machine, not mechanize the User”, the main critic remains the AI’s “injunctive power”. Combined with consent, it makes an unstoppable combination to turn us mankind into passive and obedient sheeps! By the way, the GAFAS, that promote IA, never really assumed they were the bad guy.
Indeed, IA can been seen as the ultimate market achievement where, reduced as consumers, we only take decisions with utilitarian goals, “obeying” to IA.
Regarding NLP, the disturbing thing is the familiar form that this injunction takes. The Chatbot speaks to you, the NLP writes and advises you with your own words… This is step forward to a very persuasive soft power. So, is the very idea of rebelling still even exist as it sounds ludicrous, to fight with a friend!
What I mostly remember from this reading:
– IA is a threat to humanity, especially our free will.
– IA is the ultimate version of “Invisible hand”, so criticizing IA seems to be the way to “burn down” the system aka capitalism, GAFAS (Facebook, Google, Amazon… etc. that are mostly behind the IA libraries and expect something in return: your datas so you can be profiled.
Even though it is nowadays almost impossible to avoid IA, that’s always good to read opinions against the mainstream way of thinking so you’ll be aware of the potential IA’s threats!
https://www.amazon.fr/Lintelligence-artificielle-lenjeu-si%C3%A8cle-antihumanisme/dp/2373090503
https://onezero.medium.com/peter-w-singer-explains-why-his-robot-revolution-is-inevitable-2b858a68151f
What’s next? How can I use NLP?
I wonder more and more if the target has not become to even drop PHP to build a web application. Is refactoring a PHP legacy Code in Python is an option?
Indeed, building a web application, I do not even talk about a website seems to be easy nowadays. You can gather an effective SPA (Single Page Application) in a very short time but providing meaningful and advanced features for a CMS is much trickier!
To be totally transparent, a simple question is spinning around in my head: How can I add some “intelligent” functionalities, using these Python libraries, to an existing CMS made in PHP (Laravel or Symfony for instance)? Apparently, the way seems to build a separate API in Python that will brigde with PHP!
Like I said in my previous post, these NLP libraries are not only enabling new tasks to be made but these libraries can even carry out tasks like a real human such as me a P.O for a Backoffice! Great, I am outsourcing myself.
Using Anaconda
A reminder for useful commands with Anaconda
# Check Anaconda installation conda --version # Update conda conda update -n base -c defaults conda # Create an environment # environment with Python 3.5 conda create --name myEnvironmentOne python=3.5 # launch a script named 006_nltk_cookbook_test.py --- python 006_nltk_cookbook_test.py # Get into the env named myEnvironmentOne conda activate myEnvironmentOne # get out from an env conda deactivate # By default you are in the base env no need to activate # conda activate base |
In conclusion
This reading gave me an overview of the Python’s possibilities in terms of text understanding. NLP seems to be progressing more and more every day and remains fairly accessible if you do not pretend to be a specialist! It would be a shame to do without it while waiting to see progress in text’s generation with BERT for example.
To remove doubts about the NLP’s potential, this post underwent both a keyword extraction and summarization operations with the help of 2 scripts given here as example. The result is not bad! It is a true crutch for a rookie journalist like me! You can see below the result.
# result for post tags python, nlp, spacy, ia, chatbot # summarization result These videos are giving some enlightening on nlp's concepts such as stemming, tokenization, tokenizer or bag of words or even some explanations on different type of ia’s network such as convolutional neural network and the way to use it.here is a posts'digest to start with nlp oriented around 2 basic usages, that can be implemented in a cms: practical use cases in a cms's support and simple techniques to extract keywords or even "slice" a post. Rake extracts keywords that should describe the main topics expressed in a document. An article from ng wai foong and some other examples from the great official spacy documentation show how to quickly get to grip with spacy. Concretely, it means exploring and learning python to improve both user support (faqs turn as a chatbot, analyzing user feedback...) but also think about some editorial features especially with the help of natural language processing (nlp). |
Read more
- Create your chatbot using Python NLTK
https://medium.com/predict/create-your-chatbot-using-python-nltk-761cd0aeaed3 - Building Language Models (nlpforhackers)
https://nlpforhackers.io/language-models/ - Deploying a Text Classification Model Using Flask and Vue.js
https://heartbeat.fritz.ai/deploying-a-text-classification-model-using-flask-and-vue-js-25b9aa7ff048 - Text Generation with Python and TensorFlow/Keras
https://stackabuse.com/text-generation-with-python-and-tensorflow-keras/ - TEXT CLASSIFICATION WITH TORCHTEXT
https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html - Text Analysis by monkeylearn
https://monkeylearn.com/text-analysis/ - BERT Word Embeddings Tutorial
https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/ - A Hacker’s Guide to Python string and Natural Language Processing (NLP) packages
https://gist.github.com/brianspiering/64b2256f25880a97936c198955b437e1 - List of free resources to learn Natural Language Processing
https://hackernoon.com/list-of-free-resources-to-learn-natural-language-processing-5bc4b76db552 - How I built and launched an AI product for under $100
https://blog.usejournal.com/how-i-built-and-launched-an-ai-product-for-under-100-6284646ec56b - Building Chatbots – Introduction
https://nlpforhackers.io/chatbots-introduction/#more-8595 - Developing a Single Page App with Flask and Vue.js
https://testdriven.io/blog/developing-a-single-page-app-with-flask-and-vuejs/ - Using Vue.js To Create An Interactive Weather Dashboard With APIs
https://www.smashingmagazine.com/2019/02/interactive-weather-dashboard-api-vue-js/ - Sample dockerized environmennt with Vue.js and Python API
https://github.com/e0ne/docker-vuejs-python-nginx - Building a Google search clone SPA with Vue and Flask
https://scotch.io/bar-talk/
building-a-google-search-clone-spa-with-vue-and-flask - Training Tensorflow Object Detection API with custom dataset for working in Javascript and Vue.js
https://towardsdatascience.com/training-tensorflow-object-detection-api-with-custom-dataset-for-working-in-javascript-and-vue-js-6634e0f33e03 - Flask-RESTful – User’s Guide
https://flask-restful.readthedocs.io/en/latest/ - Transformers, State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
https://github.com/huggingface/transformers - Sentiment Analysis with BERT and Transformers by Hugging Face using PyTorch and Python
https://www.curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/s - Flask’s documentation
https://flask.palletsprojects.com/en/1.1.x/ - Django is a high-level Python Web framework
https://www.djangoproject.com/ - word_cloud, a little word cloud generator in Python.
https://github.com/amueller/word_cloud - OpenAI’s GPT-2: A Simple Guide to Build the World’s Most Advanced Text Generator in Python
https://www.analyticsvidhya.com/blog/2019/07/openai-gpt2-text-generator-python/ - Transfer Learning for NLP: Fine-Tuning BERT for Text Classification
https://www.analyticsvidhya.com/blog/2020/07/transfer-learning-for-nlp-fine-tuning-bert-for-text-classification/ - How To Train Your Chatbot With Simple Transformers
https://towardsdatascience.com/how-to-train-your-chatbot-with-simple-transformers-da25160859f4 - Python Engineer Videos
https://www.youtube.com/c/PythonEngineer/videos - Patrick Loeber (Python Engineer) Github
https://github.com/python-engineer - github projects about text-mining
https://github.com/topics/text-mining - bert-generative-text by JulienHeiduk
https://github.com/JulienHeiduk/bert-generative-text - flashtext by vi3k6i5
https://github.com/vi3k6i5/flashtext - Basic Flask Website tutorial
https://pythonprogramming.net/basic-flask-website-tutorial/?completed=/practical-flask-introduction/
- Code from a book “Python Natural Language Processing”
https://github.com/jalajthanaki/NLPython - How to Build a Text Generator using TensorFlow 2 and Keras in Python
https://www.thepythoncode.com/article/text-generation-keras-python - How to Build a Text Generator using TensorFlow and Keras in Python (code
https://github.com/x4nth055/pythoncode-tutorials/tree/master/machine-learning/nlp/text-generator - How to build a keyword suggestion tool using TensorFlow
https://wordlift.io/blog/en/keyword-suggestion-tool-tensorflow/ - HOW TO AUTOMATE KEYWORD RESEARCH WITH APIS & PYTHON SCRIPTS
https://seobutler.com/how-to-automate-keyword-research-with-apis-python-scripts/
- RAKE short for Rapid Automatic Keyword Extraction algorithm
https://github.com/csurfer/rake-nltk - Implementation of TextRank for keyword Extraction
https://github.com/JRC1995/TextRank-Keyword-Extraction - lazynlp
https://github.com/chiphuyen/lazynlp - Generating Random Poems with Python
https://www.hallada.net/2017/07/11/generating-random-poems-with-python.html - Pythonprogramming, excellent source in python
https://www.pythonprogramming.in/ - Text Preprocessing in Python: Steps, Tools, and Examples
https://medium.com/@datamonsters/text-preprocessing-in-python-steps-tools-and-examples-bf025f872908 - Text Mining in Python: Steps and Examples
https://medium.com/towards-artificial-intelligence/text-mining-in-python-steps-and-examples-78b3f8fd913b - Stop Words – Natural Language Processing With Python and NLTK p.2
https://www.youtube.com/watch?v=w36-U-ccajM - Chunking – Natural Language Processing With Python and NLTK p.5
https://www.youtube.com/watch?v=imPpT2Qo2sk - Sentdex benhind pythonprogramming.net
https://github.com/Sentdex - Kavita Ganesan website
https://kavita-ganesan.com/
API python + vue.js
Python goes on web…
If you need to go to the web
Some extra ressources