Improve a CMS’s photos library qualification with AI, facial recognition in python, to provide better images search results to users

My last post was a modest attempt in discovering Python potential with the help of Anaconda. Meanwhile, Artificial Intelligence (AI) was bewitched me πŸ™‚ So, I gained confidence using Python in order to explore AI librairies such as NTLK, OpenCV or face_recognition from ageitgey (Adam Geitgey).

You can find the files in my github account More on in python_playing_with_facial_recognition

This post is about a real issue that can be solved by IA. I believe that for any technical topic, rather than exploring AI documentation which in itself could be quite wastful. I always strive to tackle a real use case and apply as much as possible what I want to discover.

So, applying this principle, I have spotted facial recognition for pictures. Like many, I am iOS mobile Phone user and I saw facial recognition at work. I start to wonder: “How can apply those facial recognition principles in a backoffice?”. As I am currently a PO for a Backoffice, I am supposed to have a clear “vision” of my product but more than that I know perfectly its weaknesses, especially dealing with pictures.

The benefit of this simple question allows me not to stress about the details right away and instead quickly jot something down as a placeholder for a conversation between me, myself and I, a soliloquy. The intent was to not forget to ask all the right questions either upfront or somewhere down the road but do not prevent from starting a P.O.C.

Here is my shortlist that highlights some of the functional and nonfunctional concerns or requirements that I wanted to address. This list has driven all my decisions to obtain a pragmatic result without missing the goal.

Goal: How can apply those facial recognition principles in a Backoffice in order to improve image search results at the end?

Short actions todolist:

  1. Browse unknown people images directory with Facial Recognition Script
  2. Detect known people faces among these images set them aside
  3. Insert these known people images in a MySQL database

Requirement: This Spike will be all made in Python but I do not want to end up with an overzealous shopping list of requirements.

Let me explain the situation. In the Backoffice, which I am dealing with, is hosting thousand of “unqualified” images that contains faces of: Donald Trump, Xi Jinping, Angela Merkel, Boris Johnson, Emmanuel Macron, Vladimir Poutine, Recep Tayyip Erdoğan or less kwnown poeple in an european centric point of view: Macky Sall, Rodrigo Duterte, Ramzan Kadyrov, Hun Sen, Narendra Modi, Hassan Rohani, Stevo Pendarovski, NicolΓ‘s Maduro, Edgar Lungu…

Remember that we still need a human intelligence to say who is who? For your information, Stevo Pendarovski, Π‘Ρ‚Π΅Π²ΠΎ ΠŸΠ΅Π½Π΄Π°Ρ€ΠΎΠ²ΡΠΊΠΈ, is president of North Macedonia, holding the office since 12 May 2019 and he looks a bit like me πŸ™‚ or that is just the glasses.

The face of Stevo Pendarovski, Π‘Ρ‚Π΅Π²ΠΎ ΠŸΠ΅Π½Π΄Π°Ρ€ΠΎΠ²ΡΠΊΠΈ, president of North Macedonia
Improve a CMS's photos library qualification with AI, facial recognition in python, to provide better images search results to users

The idea is to increase the relevance of research and reduce the sample of images that will be retrieved with a traditionnal textual search based on their name. It will save time, money and resources but also improve user experience. So, user do not get no results at all or improper results in his/her image search.

The fact, no one wants to qualify each image by opening them one after the other, adding correct caption to improve indexation and by consequence future’s search results. We are talking about more than 1 500 000 pictures. Indeed, the wise choice is to leave it to a computer.

This is where the “US cavalry” comes to the rescue in the person of facial recognition library. This library will give you an easy way to detect known faces e.g. list of current state leaders, across all these images.

The machine learning is here for that and there are tons of pre-trained models and ready to go libraries that you can use. With this magic wand, you properly scale up your skills and impress your friends or your mother. “Look mummy, I am a data scientist!”.

As the facial recognition library will truly analyzed the images, it will prevent you to mix up images search with common namesake or word-sense disambiguation. For instance, you can make a difference between Michelle and Barack Obama or if you type Bush and you are looking for the 2 former US presidents, you won’t get australian bush’s pictures.

At the end, as a collateral “damage”, when the requalification will be impleteted in the media database, you’ll be able to improve the UX by adding a filter named such as “Portrait” in source selection dropdown menu. Menu that already contains filters like Editorial, Broadcast, Press agency and let the user select source for his/her images search.

One last thing, on an historical point of view, what conclusion can we draw from this allegedly harmless experience?

In the last century, robotization has replaced blue-collar workers in an automobile assembly line for instance. Let’s say ending mass employment in the first industrial revolution industries. What about our nowadays industries? I guess that IA will certainly replace human intelligence e.g. white-collar workers for tedious and repetitive administrative works such as writing meeting minutes, sorting pictures, sorting documents, answering to questions…. and maybe even more creative works!

As I speak to you, it is no more a dystopia! IA is creeping everywhere. Exception made in some companies where people are still working like in caveman age. In this case, the purpose here is to “buy social peace” by keeping obsolete jobs alive! Well, sooner or later, there will be an arbitration between some of your job’s tasks and their possible carry out by Artificial Intelligence (AI). Is there a name for this phenomenon? Anyway, you’ll be facing soon this “intelligentization” or “artificialization” of your job so you’d better be ready! I do not even mention the IA’s usage especially facial recognition in surveillance and security that is already a standard!

Never forget that modern capitalism loves creative destruction and see AI as a disrupting factor in many industries. This is such a growing question that there’s even a website called, “Will Robots Take My Job?”. The website’s name speaks for itself. You can look up a job title and see the its likelihood of AI-driven doom.

But, let’s come back to our user story, playing with facial recognition in a very inoffensive way !

First, I need to have Anaconda install, see previous post for the installation instructions:

Here is a quick overview on the environment that I am using.

if you do not want to read anymore, I made 2 videos to illustrate this post.

Installation and configuration for MySQL Community Server and Sequel Pro

Furthermore, for this use case, I will install and configure MySQL and Sequel Pro. I leveraging on a MySQL server installed on my mac.

Here are few useful instructions to handle MySQL in the console.

1. Add MySQL to $PATH

sudo -s 
vi /etc/paths
vi $HOME/.bash_profile
export PATH=$PATH:/usr/local/mysql/bin
source $HOME/.bash_profile

2. Using the MySQL shortcut in the console to connect to MySQL

mysql -u root -p
# equivalent to /usr/local/mysql/bin/mysql -u root -p

3. Instructions to connect MySQL to Sequel Pro
It is to prenvent a possible error with Sequel Pro and MySQL 8.0 on Mac.

1. Open MySQL from System Preferences > Initialize Database >
2. Type your new password.
3. Choose ‘Use legacy password’
4. Start the Server again.
5. Now connect Sequel Pro

For the step 5, you must have the credentials for Sequel Pro: root password for MySQL and the ip address of the MySQL server

Source for common errors in installing MySQL 8.0 on a Mac

Commands for MySQL console
Here are some commands for MySQL if you are in the console. I gave example with fictional database and tables. You can also use Sequel Pro if you want too.

--- # Show all databases
--- # Create a database named test_db_1
--- # Delete a database named test_db_1
DROP DATABASE test_db_1;
--- # Create few databases
CREATE DATABASE node_countries;   
CREATE DATABASE node_countries_all;   
CREATE DATABASE facial_recognition;   
--- # Use databases
USE node_countries;
USE node_countries_all;
USE facial_recognition;
--- # Insert DUMP in the databases
SOURCE /Users/brunoflaven/Documents/02_copy/_000_IA_bruno_light/facial_recognition/node_countries_dump_1.sql;
SOURCE /Users/brunoflaven/Documents/02_copy/_000_IA_bruno_light/facial_recognition/node_countries_dump_2.sql;
SOURCE /Users/brunoflaven/Documents/02_copy/_000_IA_bruno_light/facial_recognition/photos_known_person_dump_1.sql;
--- #INSERT command for databases
INSERT INTO node_countries VALUES (NULL, 'Belgium', '.be', 'BE', 'Brussels', '32');
INSERT INTO node_countries VALUES (NULL, 'Czechia', '.cz', 'CZ', 'Pragues', '420');
INSERT INTO node_countries VALUES (NULL, 'Virgin Islands (British)', '.vg', 'VG', 'Road Town', '1-284');
INSERT INTO node_countries VALUES (NULL, 'Virgin Islands (USA)', '.vi', 'VI', 'Charlotte Amalie', '1-340');
INSERT INTO node_countries VALUES (NULL, 'Wallis and Futuna Islands', '.wf', 'WF', 'Mata-Utu', '681');
INSERT INTO node_countries VALUES (NULL, 'Western Sahara', '.eh', 'EH', 'El Aaiun', '');
INSERT INTO node_countries VALUES (NULL, 'Yemen', '.ye', 'YE', 'San'a', '967');
INSERT INTO node_countries VALUES (NULL, 'Zambia', '.zm', 'ZM', 'Lusaka', '260');
INSERT INTO node_countries VALUES (NULL, 'Zimbabwe', '.zw', 'ZW', 'Harare', '263');
INSERT INTO node_countries VALUES (NULL, 'Tunisia', '.tn', 'TN', 'Tunis', '216');
--- Source for Countries :
--- #SELECT records
SELECT COUNT(*) FROM node_countries;
SELECT * FROM node_countries WHERE name LIKE '%Ger%';
SELECT * FROM node_countries;
SELECT COUNT(*) FROM node_countries;
SELECT COUNT(*) FROM known_persons;
SELECT * FROM known_persons;
--- # EMPTY
TRUNCATE known_persons;
--- # UPDATE
UPDATE node_countries SET capital = 'Prague' WHERE id = 13;

Installation and configuration for MySQL Python

There are different ways to install MySQL-python. One is using Pip, the other is using Anaconda.

1. To check your anancoda version in the console.

$ conda --version

2. If needed upgrade pip

$ pip install --upgrade pip

3. To install the mysql-connector-python package, type the following command.

$ pip install mysql-connector-python # way_1
$ conda install mysql-connector-python # way_2

4. To install the pymysql package, type the following command.

$ pip install pymysql # way_1
$ conda install pymysql # way_2

5. Let’s install facial recognition in order to sort pictures. I tried with Homebrew but it did not work for me so I decided to go with Pip. Here is the other way to install face_recognition with PIP.

sudo -s
$ pip install --upgrade pip
$ pip install cmake
$ pip install dlib
$ pip install face_recognition
$ pip install opencv-python

6. It is good to verify the installation of face_recognition with PIP and some other stuff by doing so.

# Verify face_recognition install
# go into the console and call python
$ python
# The output shoud be the following
# Python 2.7.14 |Anaconda, Inc.| (default, Oct  5 2017, 02:28:52) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin Type "help", "copyright", "credits" or "license" for more information.

7. You can call the librariries and if get no errors, you are good to go.

>>> import dlib
>>> import face_recognition
>>> import cv2
>>> exit ()

Let’s face the problem…

I will probably do a video to see how to manipulate the files. Like always, the most difficult part of the facial recognition is to have installed the correct environment. The rest is easy as 1,2,3. The idea to inject in a MySQL database is to easily retrieve the results, also if needed extend the database to insert a set of legend, captions… and so on. Then, you can just identify differences and produce some helpful statistics along with them and so finally update the real media databases with correct elements to improve textual search.

You can browse files from my GitHub account, the scripts and filenames are self-explanatory.

# get to the dir where you will execute the script
cd /path-to-your-dir/python_face_recognition_mysql/
└── images
    β”œβ”€β”€ known
    β”‚   β”œβ”€β”€ macron_cocorico_gettyimages_1048128128.jpg
    β”‚   β”œβ”€β”€ ...
    └── unknown
        β”œβ”€β”€ group_one_17_persons_16_faces.jpg
        β”œβ”€β”€ ....

Read more