Quick POC for a All-in-one that provides an SEO dashboard made with Streamlit, managing Screaming Frog automation, storing results in a Database (SQLite) and create data-analysis graphics for SEO reports

So, my objective was to build a dashboard with Streamlit that automate Screaming Frog SEO Spider for large time saves and fast audits then manage the output (csv reports), save it into a SQLite database and analyse it with creating graphics. You can find all the code on my github account at https://bit.ly/3oAPuBP

I. SEO POC

1. Methodology

Like always I am heading for a POC (Proof of Concept). So, pragmatism over theory. POC does not have to be perfect or compliant to some coding best practices or even logical at some point! The real value occurs often during the POC in some low noise signals, so be open minded and listen… As it is said “Knowledge speaks, Wisdom listens”. In this exploration phase, the purpose remains to achieve a fixed goal in a limited amount of time, avoiding spending 2 years for instance on a POC! 🙂 That’s it for the methodology.

2. Value

As a PO, sometime for website, I need to have an eye on SEO KPIs. Even so, I am far from being a specialist, I need to grab and quickly overview key indicators (kinda a PO data-science oriented) and henceforth gather SEO best practices to fine-tune the website eventually.
But, as far as possible, it has to be made by a “tool” that ease these processes so it should be industrialized and automatized. The background idea is always to chain actions between them in order to improve productivity and facilitate decision-making. That’s it for the value proposal. Just remember that learning has a cost and experimental work is expensive! So, Timeboxing exploration is a duty!

3. Individual

What about my objectives me as an individual, they are several:

  • Delegate tedious work to machines in order to give me more free time
  • Improving my Python’s practice
  • Give a more data-science flavor to my resume to become a mixed flavored data-science PO for digital! By the way, for me, Streamlit has solved this paradox between investing on web or data-science. With Streamlit I can do both: (1) IA or Machine learning web tools or (2) handy Web Tools for other dedicated job’s purposes.

This POC is definitely in the second category. That’s it for my personal investment. As a first conclusion, I will say that any POC is a always a good tradeoffs between personal and professional objectives.

4. Sources

This post has been also inspired by these 2 posts but I believe I have taken in a more personal direction the challenge exposed in these 2 posts.

I quote this post cause I need to remember it, about the inadequacies sometimes between tech and reality.

The first job of an application architect is to “right size” the business or enterprise with the most efficient technology for their use case; not just saddle them with your pet preferences because it’s the latest “fad-tech” from Google or it’s what they taught you in school.

Source: https://beau-beauchamp.medium.com/php-is-killing-python-2be459364284

II. Build up the app (the POC)

Ok let’s build our dashboard! Like many softwares, Screaming Frog SEO Spider offers the ability to run without User Interface via Command line. With help of Streamlit, you can build your own webapp (GUI) to industrialize, automate repetitive tasks then transform raw data into useful information. That is my objective.

1. Guidelines

So, let’s build this GUI with these few simple guidelines defined earlier:

  • Streamlit library will be the “wrapper” for our application. (https://streamlit.io/)
  • SQLite will be used as database. I gave up completely MySQL as SQLite is a serverless database. It gives more flexibility to implement a database behind an app, that is the main reason why SQLite is often used in mobile development. (https://sqlite.org)
  • SQLAlchemy will be use to query the database to offer again flexibility and minimize the written code in Python. Thanks to this abstraction (ORM), it is possible to connect to a large number of databases (Firebird, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, SQLite, Sybase) and above all to standardize and considerably minimize your code in order to make it readable by other people and finally reusable. That’s savvy! (https://www.sqlalchemy.org/)
  • Screaming Frog SEO Spider is a website crawler that helps you improve onsite SEO. (https://www.screamingfrog.co.uk/seo-spider/)

Behind all these technical recommendations, there are few key ideas that again should be exposed because these ideas have also prevailed in this POC.

2. Few more ideas

2.1 praise for simplicity
As a PO, you might suffer from the information and tools dissemination that prevent you from having a big picture of your products. So handling everything in the same screen helps to minimize information’s confusion. It should be the target of any monitoring application. I will call it praise for simplicity.
For an application, think to centralize and to minify the info, never unnecessarily expand it. That’s a waste of time for user.

2.2 use cases decision matrix
Always think of the most extreme use cases and the non-user use cases for your web application. It will force you to
look beyond the objective and look for the most attractive use cases. That is also a practical strategy to shorten the exploration phase to look for attractive use cases in order to save time on the primary exploration phase.
What the purpose of building an application that won’t be use ?

2.3 code prospecting for usages
Capture the most possible information about similar existing web application like the one you want to create. For
instance, browsing through Git to grab some code you will prevent you from reinventing the wheel. That’s a usages benchmark that can be withdrawn from existing codes.

2.4 collect user feedback first then analyse
Like for any data science project, try to collect the data yourself by mingling with other users than you. One advice if looking user feedback, be sure to record everything of their responses. While you are listening, your bandwidth is very low and it is almost impossible to detect the relevant vs the irrelevant in their feedback. You will make analysis it but later!

3. The DB: using SQLITE

Instead of MySQL, I am using SQLITE and like I said before: it is easier to implement but the syntax slightly differs from MySQL.

3.1 Advices
You can name your database with both extension .sqlite3 or .db
e.g. streamlit_sqlalchemy_example.sqlite3 or streamlit_sqlalchemy_example.db

# name of your database
engine = create_engine(
        'sqlite:///data/streamlit_sqlalchemy_example.sqlite3')

A good practice is to put the db file in a directory then you find it easily e.g. data is my directory name for db
files.

# Valid SQLite URL forms are:
sqlite:///:memory: (or, sqlite://)
sqlite:///relative/path/to/file.db
sqlite:////absolute/path/to/file.db

2 important lines to stress how to connect, for the rest, you can check the files available on my github account.

# this line create the empty tables
    Base.metadata.create_all(engine)
 
# it imports the table that you need e.g Products
from streamlit_sqlalchemy_database import Products

3.2 SQLite command reminder
Some useful commands for SQLite

# TO GET INTO SQLITE3, just TYPE the command sqlite3 IN the console
sqlite3
 
#.open /your-path/SOURCE.db
# OPEN a connection extension can be .db
.open /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/DATA/screamingfrog_websites_crawls_all.db
 
# OPEN a connection extension can be .sqlite3
.open /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/streamlit_sqlalchemy_example/DATA/screamingfrog_websites_crawls_all_new_1.sqlite3
 
# SHOW TABLES
.tables
 
 
# make a dump
.output /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/sqlite_source_dump_2.sql
.dump
 
 
# READ dump
.read /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/sqlite_source_dump_2.sql
 
 
- command
SELECT COUNT(*) FROM websites;
SELECT * FROM websites;
 
SELECT COUNT(*) FROM crawls;
SELECT * FROM crawls;
 
- DELETE TABLES
DROP TABLE websites;
DROP TABLE crawls;
 
 
- DELETE
# DELETE FROM TABLE_NAME;
DELETE FROM websites;
VACUUM;

4. Describing Wireframe Screens

My app project is divided in 4 main screens and here is the navigation chosen below:

  • screen_1 :: General
  • screen_2 :: Crawl
  • screen_3 :: Parse and Insert
  • screen_4 :: Analyze

Making a Wireframe, in addition to being time consuming, also presents the disadvantage that you must define different screen status every time the user is
hitting a button! Honestly, it is obvious that the screens resemblance between the Wireframe and the real application mitigate this step interest. In a way, design screens for your web-application is useless as Streamlit is so quick to build up the screens itself.

To illustrate this non-interest, I put side by side Wireframe (made quickly with Balsamiq) and the real app (made quickly also with Streamlit). The only thing that may have a real interest is to summarize for each screen, with simple words, the User Experience expected like a User Story. I know Graphic Designers or UX Designers may hate me for this. Below my app User Stories.

  • navigation :: As a user, I can select
    destination in a main navigation dropdown menu on the left.
  • screen_1 :: General :: As a user, I can add into the database a website (title, url) to be crawled later.
  • screen_2 :: Crawl :: As a user, I can launch the crawl with Screaming Frog by selecting a website in a drop-down menu. By clicking on a button to launch the crawl, it will be print out the Screaming Frog command-line and executed it.
  • screen_3 :: Parse and Insert :: As a user, I can parse Screaming Frog’s CVS “reports” directory, select a file and launch computation then insert data into the SQLite database.
  • screen_4 :: Analyze :: As a user, I can load graphics that analyzed Screaming Frog’s SEO KPIs taken from the SQLite database.

(Screen_1) General

Wireframe (Balsamiq)

Real app (Streamlit)

(Screen_2) Crawl

Wireframe (Balsamiq)

Real app (Streamlit)

(Screen_3) Parse and Insert

Wireframe (Balsamiq)

Real app (Streamlit)

(Screen_4) Analyze

Wireframe (Balsamiq)

Real app (Streamlit)

5. Create alias for your favorite application in the MAC console

I love todo-list because I always forget how to do things. It is nice to have somewhere a little helping note that shed some light on actions that you seldom do. That is the purpose of the lines below even.

So, in order to simplify the command-line, I made a alias to be able to call screaming frog application with the alias
froggy. Unfortunately it is working in the console but it does not work when you use it into your python scripts. Anyway I bypass this obstacle by using the direct path to the screaming frog application.

5.1 Make a shortcut on a MAC for Screaming Frog SEO
Source : https://cometfuel.com/learn/automate-screaming-frog/#using-terminal-mac

As a shortcut, you can use whatever you want! Make it simple and meaningful e.g. screamingfrogseospider, screaminjayhawkins or ranagritando or a more basic sf (careful to not confuse with symfony for instance…). I will gone with froggy as a shortcut.

Command only working on the console

# full command working in the console
froggy --crawl http://flaven.fr --headless --save-crawl --output-folder
"/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/reports/" --export-tabs
"Internal:HTML" --overwrite --config
"/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/seo_spider_config_3.seospiderconfig"

Command working in the python scripts

/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher --crawl http://flaven.fr
--headless --save-crawl --output-folder
"/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/reports/" --export-tabs
"Internal:HTML" --overwrite --config
"/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/seo_spider_config_3.seospiderconfig"

There are probably tons of explanations to find or to give about this issue but I’d rather move on as there is better to do!

Add a shortcut for Screaming Frog for the mac console

# go to /Users/[username]
cd ~
 
# check if .bash_profile is in this directory
ls -la
 
 
# open with vim or nano
vim .bash_profile
nano .bash_profile
 
# using textedit native on mac
open -e .bash_profile
 
# using sublime text
subl .bash_profile
 
 
# adding an alias for Screaming Frog, I named it froggy
alias froggy="/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher"
source ~/.bash_profile
 
# test the froggy shortcut
froggy --help
# commands inside the console working
froggy --crawl <url>
froggy --crawl https://flaven.fr/
froggy --headless --crawl https://flaven.fr/
froggy --headless --save-crawl --output-folder /Users/brunoflaven/Documents/02_copy/_streamlit_ideas_for_app/ --timestamped-output --crawl https://flaven.fr/
froggy --headless --save-crawl --timestamped-output --crawl https://flaven.fr/
 
 
# full command working in the console
froggy --crawl http://flaven.fr --headless --save-crawl --output-folder "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/reports/" --export-tabs "Internal:HTML" --overwrite --config "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/seo_spider_config_3.seospiderconfig"
 
 
# open the app from the console
open "/Applications/Screaming Frog SEO Spider.app"
 
# testing the shortcut for Screaming Frog SEO Spider.app
froggy --help

5.2 Make a shortcut on a MAC for Sublime Text
To use subl, the Sublime Text bin folder needs to be added to the path. For a typical installation of Sublime Text, this will be located at /Applications/Sublime Text.app/Contents/SharedSupport/bin. If using Bash, the default before macOS 10.15, the following command will add the bin folder to the PATH environment variable:

# add sublime directly in the console...
echo 'export PATH="/Applications/Sublime Text.app/Contents/SharedSupport/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile
# check
subl --help

Source: https://www.sublimetext.com/docs/command_line.html#mac

6. Videos

3 additional videos to tackle this post

  • Python, Screaming Frog, SEO, Automate, POC Part 1 Manipulating Data with Streamlit & SQLite with the help of SQLAlchemy
  • Python, Screaming Frog, SEO, Automate, POC Part 2 Creating Database in SQLite with Streamlit and SQLAlchemy
  • Python, Screaming Frog, SEO, Automate, POC Part 3 Creating Database in SQLite with Streamlit and SQLAlchemy

Conclusion

Few conclusions can be drawn from this POC:

  • Python, Pandas, Numpy, Plotly and Streamlit… are strong enablers to build personal usage dedicated Software or SaaS. It reduces drastically the entrance fee to the Software or SaaS market! I must admit that learning recent web technologies bring me closer than ever to the “serious” software industry than I thought. I should say it the other way round that is more the Software Industry that shifts to SaaS, Cloud and web technologies since 20 years from now…. after all any website is nothing more than a “software” running in a browser.
  • It reinforces me as a user centric PO and shapes me as an “improver”, “tinker” or “builder” for added value tools to improve daily processes and focus on delivering quickly real value! It is F… Faith Declaration!

More infos